giabaio / survHE

Survival analysis in health economic evaluation Contains a suite of functions to systematise the workflow involving survival analysis in health economic evaluation. survHE can fit a large range of survival models using both a frequentist approach (by calling the R package flexsurv) and a Bayesian perspective.
https://gianluca.statistica.it/software/survhe/
41 stars 19 forks source link

Formatting of inputs for digitise function #24

Closed NatalieSoto closed 1 year ago

NatalieSoto commented 5 years ago

I've digitised survival data using digitizeit. But i've had some issues regarding using the digitise function in the survHE package. I've managed to get the formatting for the survival input txt file right, but I'm not quite sure where to get the necessary information to get the 5 necessary columns for the number at risk txt file. Is there a pre-processing step that I'm still missing? Looking at the function it seems that this might be the type of table required: time n.risk n.event n.censor surv upper lower 1 11 138 3 0 0.9782609 1.0000000 0.9542301 2 12 135 1 0 0.9710145 0.9994124 0.9434235 3 13 134 2 0 0.9565217 0.9911586 0.9230952 4 15 132 1 0 0.9492754 0.9866017 0.9133612 5 26 131 1 0 0.9420290 0.9818365 0.9038355 6 30 130 1 0 0.9347826 0.9768989 0.8944820

Thanks and again, apologies if I'm just missing something!

Natalie

NatalieSoto commented 5 years ago

I thought I had found the issue, but still need help.

giabaio commented 5 years ago

Hi Natalie, I am a bit confused... sorry. Are you using DigitizeIT to map the Kaplan Meier curves into a file with the relevant data? digitise takes as input files generated by using DigitizeIT --- which is by no means the only one you can use, but it was the one used by the colleagues I was working with when developing this bit of survHE and so that's what I used too. No reason why this can't be expanded to other formats to then plug into digitise --- but I'm not sure whether that's your issue in the first place?

NatalieSoto commented 5 years ago

Yes, I apologise, I wasn't clear. I used digitise but had no issues with the surv_inp argument. The nrisk_inp gave me this error "Error in [.data.frame(pub.risk, , 5) : undefined columns selected" because I was just using the information on number at risk that I had. But looking at the function, it seems to require a 5 column table like the one I pasted that I could get with the survfit function. But I don't have the number of deaths or number censored, just number at risk, so I didn't know where to get that information without processing the digitized curve, maybe counting the symbols myself. The information I have is in the attached picture.

Thank you!

curva1
giabaio commented 5 years ago

So just to be clear. You do have the graph you sent, but you're not using any digitising software? I think the process as intended would be the following. 1) obtain the graph with the KM curve (as you have). Ideally you need a table of no. at risk (as you do). 2) use a digitisation software (eg DigitizeIT). This effectively entails you pointing and clicking on the graph several time. The software would reconstruct the values on the curve and generate two files. The first one (say called survival.txt) is something like this

        Time    Survival
1       0.00    1.00
2       2.89    1.00
3       2.94    0.98
4       3.02    0.97
5       3.02    0.96
6       3.21    0.96
7       3.43    0.94
8       4.77    0.92
9       4.84    0.91

while the second one (say nrisk.txt) is something like this

Interval time lower  upper    nrisk   
1       0       1       7       61
2       4       8       14      55
3       8       15      21      49
4       12      22      27      31
5       16      28      31      13

It is those files that you need as input to digitise so it can recreate the IPD, eg using something like

library(survHE)
digitise(surv_inp="survival.txt",nrisk_inp="nrisk.txt",
            km_output="KMdata.txt",ipd_output="IPDdata.txt")
NatalieSoto commented 5 years ago

I see the issue now, I was using digitising software, but it was only producing the first file. Will check on that. Apologies for my confusion.

Thank you!

Natalie

giabaio commented 5 years ago

No probs. Let me know if all works!

NatalieSoto commented 5 years ago

DigitizeIt seems to only produce the csv for the KM curve. Couldn´t find a way to get the second file, again. But I found this article that uses the information I have (also from digitizeIT) and produces the rest from the digitised curve through provided r code. [https://www.sciencedirect.com/science/article/pii/S2352340917301968?]

https://doi.org/10.1016/j.dib.2017.05.005

Hopefully it will work out!

thanks!

arlondhe commented 4 years ago

So just to be clear. You do have the graph you sent, but you're not using any digitising software? I think the process as intended would be the following.

  1. obtain the graph with the KM curve (as you have). Ideally you need a table of no. at risk (as you do).
  2. use a digitisation software (eg DigitizeIT). This effectively entails you pointing and clicking on the graph several time. The software would reconstruct the values on the curve and generate two files. The first one (say called survival.txt) is something like this
        Time    Survival
1       0.00    1.00
2       2.89    1.00
3       2.94    0.98
4       3.02    0.97
5       3.02    0.96
6       3.21    0.96
7       3.43    0.94
8       4.77    0.92
9       4.84    0.91

while the second one (say nrisk.txt) is something like this

Interval time lower  upper    nrisk   
1       0       1       7       61
2       4       8       14      55
3       8       15      21      49
4       12      22      27      31
5       16      28      31      13

It is those files that you need as input to digitise so it can recreate the IPD, eg using something like

library(survHE)
digitise(surv_inp="survival.txt",nrisk_inp="nrisk.txt",
            km_output="KMdata.txt",ipd_output="IPDdata.txt")

Can you please provide the full sample files to try? I did create my files per these directions but still have error messages that come with digitise function "Error in rep(t.S[j], d[j]) : invalid 'times' argument" . Just want to see it work fully with sample data. Thanks.

giabaio commented 4 years ago

There's another issue with this problem --- have you checked that the digitised values are consistent with the monotonicity assumption in the survival curves?

CaptainKoop commented 1 year ago

So just to be clear. You do have the graph you sent, but you're not using any digitising software? I think the process as intended would be the following.

  1. obtain the graph with the KM curve (as you have). Ideally you need a table of no. at risk (as you do).
  2. use a digitisation software (eg DigitizeIT). This effectively entails you pointing and clicking on the graph several time. The software would reconstruct the values on the curve and generate two files. The first one (say called survival.txt) is something like this
        Time    Survival
1       0.00    1.00
2       2.89    1.00
3       2.94    0.98
4       3.02    0.97
5       3.02    0.96
6       3.21    0.96
7       3.43    0.94
8       4.77    0.92
9       4.84    0.91

while the second one (say nrisk.txt) is something like this

Interval time lower  upper    nrisk   
1       0       1       7       61
2       4       8       14      55
3       8       15      21      49
4       12      22      27      31
5       16      28      31      13

It is those files that you need as input to digitise so it can recreate the IPD, eg using something like

library(survHE)
digitise(surv_inp="survival.txt",nrisk_inp="nrisk.txt",
            km_output="KMdata.txt",ipd_output="IPDdata.txt")

Hi Gianluca, I have used the digitising software to obtain data similar to the Data1 below (including "Time" and "Survival") from K-M curve, and "No.at risk", but how can I get the data "lower" and "upper"? like Data2 below, Are data "lower" and "upper" "No.at risk"? I failed to find the answer elsewhere, apologize for my stupid question.

Data1

    Time    Survival

1 0.00 1.00 2 2.89 1.00 3 2.94 0.98 4 3.02 0.97 5 3.02 0.96 6 3.21 0.96 7 3.43 0.94 8 4.77 0.92 9 4.84 0.91

Data2

Interval time lower upper nrisk
1 0 1 7 61 2 4 8 14 55 3 8 15 21 49 4 12 22 27 31 5 16 28 31 13

Thanks so much! Koop

giabaio commented 1 year ago

The two files with the survival times and the nrisk are the output of digitizeIt

CaptainKoop commented 1 year ago

带有生存时间和nrisk的两个文件是数字化它的输出

I got it ,thanks a lot.