Closed rimhajal closed 3 months ago
The author of this PR, rimhajal, is not an activated member of this organization on Codecov. Please activate this user on Codecov to display this PR comment. Coverage data is still being uploaded to Codecov.io for purposes of overall coverage calculations. Please don't hesitate to email us at support@codecov.io with any questions.
Maybe the check against R should be in the tests ? I cannot see it yet.
I am checking it manually for now
In this case I cannot really help you ^^ But if you have it produce the same numbers, then you are on the right track :)
seeing that the loops are now different between PoharPerme and EdererI, is it still worth it to merge the lambda function?
@rimhajal So, at first, I thought that relsurv did simply not implement the summary method for survfit
objects, and so using it was ismply wrong (comes from another package, not controlled). But according to their docs there, rs.add
returns...
a survfit object; see the help on survfit.object for details. The survfit methods are used for print, summary, plot, lines, and points.
;)
@rimhajal missed your question. I dont know, maybe we can implement the other ones (ederer 2, hakulinen), and then decide if we want to merge or not ?
For the issue with the summary, do we compare with "fit$surv" instead in the tests?
Yes, compare directly with fit$surv and fit$std.err (while keeping in mind that the time steps are not exactly the same): the summary is j_u_n_k, as we saw friday...
Do I change the comparison to PoharPerme as well for coherence?
Hum... Yes that would be better indeed
The test were already not using the summary, Hakulinen is failing with only the surv rate.
Is it failling by much ? on the begining or the end of the curve ? Remember the other time we plotted it.
I've been trying to see what the issue might be but i don't know ... These are the plots, first is Julia second is R
Can you plot the ratio ? For the moment looks pretty good to me
Errrr never mind I did not look at the axis correctly. Indeed there is an issue here...
This is the plot anyway
This is really weird as your implementation looks correct. Maybe we made a mistake in the formula for hakulinen compared to relsurv ?
Edit: looking at relsurv code, it looks like somethign reaaaaaaly different is happening for hakulinen : there is a temp2
stuff with a Y2
special times... It's probably the reason why we do not match. But it's crazy that Nathalie's slides does not report this
The Hakulinen method wasn't in Nathalie's slides, I think we decided this was the formula based on the R code
Bingo. So thid is not the right formula :P or rather, there is more to it than just this formula.
if (method == 3) { #need potential follow-up time for Hak. method R <- rform$R coll <- match("year", attributes(ratetable)$dimid) year <- R[, coll] #calendar year in the data if (missing(fin.date)) fin.date <- max(rform$Y + year) #final date for everybody set to the last day observed Y2 <- rform$Y #change into potential follow-up time if (length(fin.date) == 1) #if final date equal for everyone Y2[rform$status == 1] <- fin.date - year[rform$status == 1]#set pot.time for those that died (equal to censoring time for others) else if (length(fin.date) == nrow(rform$R)) Y2[rform$status == 1] <- fin.date[rform$status == 1] - year[rform$status == 1] else stop("fin.date must be either one value or a vector of the same length as the data") status2 <- rep(0, nrow(rform$X)) #stat2=0 for everyone }
Any idea what is a "potential follow-up time"?
if (method == 3) { #need potential follow-up time for Hak. method R <- rform$R coll <- match("year", attributes(ratetable)$dimid) year <- R[, coll] #calendar year in the data if (missing(fin.date)) fin.date <- max(rform$Y + year) #final date for everybody set to the last day observed Y2 <- rform$Y #change into potential follow-up time if (length(fin.date) == 1) #if final date equal for everyone Y2[rform$status == 1] <- fin.date - year[rform$status == 1]#set pot.time for those that died (equal to censoring time for others) else if (length(fin.date) == nrow(rform$R)) Y2[rform$status == 1] <- fin.date[rform$status == 1] - year[rform$status == 1] else stop("fin.date must be either one value or a vector of the same length as the data") status2 <- rep(0, nrow(rform$X)) #stat2=0 for everyone }
Any idea what is a "potential follow-up time"?
Hakulinen Method The follow-up time is the actual censoring time for those subjects who are censored and the “maximum potential follow-up” for those who have died. I found this in a random person's slides (he referenced the 1982 article and Voutilainen E. T., Dickmann P. W. and Hakulinen T. SURV2: Relative Survival Analysis Program – Software Manual http://www.cancerregistry.fi/surv2/) He also put the same formula for both EdererI and Hakulinen. ExpRelSurv.pdf
I'd forget a bit about the names and stuff and concentrate on the R code to extract the meaning "later", when we have the right numbers.
Looks like the "potential" follow-up time means the death time if not censored and the end-of-study time if censored. Then they use a new censoring indicatrix that says that noone is censored. I dont fully understand why for the moment, sorry
Looks like the right interpretation is :
1) Modify somehox the data to obtain the Y2,status2,temp2
thing.
2) Compute the population part form these data instead.
And the modification looks like :
This looks a bit crazy to me but it's what is in the code.
Appendices A.1 to A.4 of this paper from hakulinen himself, in 2011, compares ederer 1 & 2 to hakulinen with clearer-ish formulas. Might be a good source to fix up what we want -- or not. Anyway we could cite it
The tests are erroring because
Error: 3 docstrings not included in the manual:
NetSurvival.EdererI
NetSurvival.Hakulinen
NetSurvival.EdererII
this doesn't work, pushing just to publish the branch