JuliaSurv / NetSurvival.jl

A pure-Julia take on standard net survival routines
https://juliasurv.github.io/NetSurvival.jl/
MIT License
6 stars 0 forks source link

adding non param methods #25

Closed rimhajal closed 3 months ago

rimhajal commented 3 months ago

this doesn't work, pushing just to publish the branch

codecov[bot] commented 3 months ago

The author of this PR, rimhajal, is not an activated member of this organization on Codecov. Please activate this user on Codecov to display this PR comment. Coverage data is still being uploaded to Codecov.io for purposes of overall coverage calculations. Please don't hesitate to email us at support@codecov.io with any questions.

lrnv commented 3 months ago

Maybe the check against R should be in the tests ? I cannot see it yet.

rimhajal commented 3 months ago

I am checking it manually for now

lrnv commented 3 months ago

In this case I cannot really help you ^^ But if you have it produce the same numbers, then you are on the right track :)

rimhajal commented 3 months ago

seeing that the loops are now different between PoharPerme and EdererI, is it still worth it to merge the lambda function?

lrnv commented 3 months ago

@rimhajal So, at first, I thought that relsurv did simply not implement the summary method for survfit objects, and so using it was ismply wrong (comes from another package, not controlled). But according to their docs there, rs.add returns...

a survfit object; see the help on survfit.object for details. The survfit methods are used for print, summary, plot, lines, and points.

;)

lrnv commented 3 months ago

@rimhajal missed your question. I dont know, maybe we can implement the other ones (ederer 2, hakulinen), and then decide if we want to merge or not ?

rimhajal commented 3 months ago

For the issue with the summary, do we compare with "fit$surv" instead in the tests?

lrnv commented 3 months ago

Yes, compare directly with fit$surv and fit$std.err (while keeping in mind that the time steps are not exactly the same): the summary is j_u_n_k, as we saw friday...

rimhajal commented 3 months ago

Do I change the comparison to PoharPerme as well for coherence?

lrnv commented 3 months ago

Hum... Yes that would be better indeed

rimhajal commented 3 months ago

The test were already not using the summary, Hakulinen is failing with only the surv rate.

lrnv commented 3 months ago

Is it failling by much ? on the begining or the end of the curve ? Remember the other time we plotted it.

rimhajal commented 3 months ago

I've been trying to see what the issue might be but i don't know ... These are the plots, first is Julia second is R plot_julia plot_R

lrnv commented 3 months ago

Can you plot the ratio ? For the moment looks pretty good to me

lrnv commented 3 months ago

Errrr never mind I did not look at the axis correctly. Indeed there is an issue here...

rimhajal commented 3 months ago

plot_err This is the plot anyway

lrnv commented 3 months ago

This is really weird as your implementation looks correct. Maybe we made a mistake in the formula for hakulinen compared to relsurv ?

Edit: looking at relsurv code, it looks like somethign reaaaaaaly different is happening for hakulinen : there is a temp2 stuff with a Y2 special times... It's probably the reason why we do not match. But it's crazy that Nathalie's slides does not report this

rimhajal commented 3 months ago

The Hakulinen method wasn't in Nathalie's slides, I think we decided this was the formula based on the R code

lrnv commented 3 months ago

Bingo. So thid is not the right formula :P or rather, there is more to it than just this formula.

rimhajal commented 3 months ago

if (method == 3) { #need potential follow-up time for Hak. method R <- rform$R coll <- match("year", attributes(ratetable)$dimid) year <- R[, coll] #calendar year in the data if (missing(fin.date)) fin.date <- max(rform$Y + year) #final date for everybody set to the last day observed Y2 <- rform$Y #change into potential follow-up time if (length(fin.date) == 1) #if final date equal for everyone Y2[rform$status == 1] <- fin.date - year[rform$status == 1]#set pot.time for those that died (equal to censoring time for others) else if (length(fin.date) == nrow(rform$R)) Y2[rform$status == 1] <- fin.date[rform$status == 1] - year[rform$status == 1] else stop("fin.date must be either one value or a vector of the same length as the data") status2 <- rep(0, nrow(rform$X)) #stat2=0 for everyone }

Any idea what is a "potential follow-up time"?

rimhajal commented 3 months ago

if (method == 3) { #need potential follow-up time for Hak. method R <- rform$R coll <- match("year", attributes(ratetable)$dimid) year <- R[, coll] #calendar year in the data if (missing(fin.date)) fin.date <- max(rform$Y + year) #final date for everybody set to the last day observed Y2 <- rform$Y #change into potential follow-up time if (length(fin.date) == 1) #if final date equal for everyone Y2[rform$status == 1] <- fin.date - year[rform$status == 1]#set pot.time for those that died (equal to censoring time for others) else if (length(fin.date) == nrow(rform$R)) Y2[rform$status == 1] <- fin.date[rform$status == 1] - year[rform$status == 1] else stop("fin.date must be either one value or a vector of the same length as the data") status2 <- rep(0, nrow(rform$X)) #stat2=0 for everyone }

Any idea what is a "potential follow-up time"?

Hakulinen Method The follow-up time is the actual censoring time for those subjects who are censored and the “maximum potential follow-up” for those who have died. I found this in a random person's slides (he referenced the 1982 article and Voutilainen E. T., Dickmann P. W. and Hakulinen T. SURV2: Relative Survival Analysis Program – Software Manual http://www.cancerregistry.fi/surv2/) He also put the same formula for both EdererI and Hakulinen. ExpRelSurv.pdf

lrnv commented 3 months ago

I'd forget a bit about the names and stuff and concentrate on the R code to extract the meaning "later", when we have the right numbers.

Looks like the "potential" follow-up time means the death time if not censored and the end-of-study time if censored. Then they use a new censoring indicatrix that says that noone is censored. I dont fully understand why for the moment, sorry

lrnv commented 3 months ago

Looks like the right interpretation is :

1) Modify somehox the data to obtain the Y2,status2,temp2 thing. 2) Compute the population part form these data instead.

And the modification looks like :

This looks a bit crazy to me but it's what is in the code.

lrnv commented 3 months ago

Appendices A.1 to A.4 of this paper from hakulinen himself, in 2011, compares ederer 1 & 2 to hakulinen with clearer-ish formulas. Might be a good source to fix up what we want -- or not. Anyway we could cite it

lrnv commented 3 months ago

The tests are erroring because

Error: 3 docstrings not included in the manual:

    NetSurvival.EdererI
    NetSurvival.Hakulinen
    NetSurvival.EdererII