Closed NatalieSoto closed 1 year ago
I thought I had found the issue, but still need help.
Hi Natalie,
I am a bit confused... sorry. Are you using DigitizeIT to map the Kaplan Meier curves into a file with the relevant data? digitise
takes as input files generated by using DigitizeIT --- which is by no means the only one you can use, but it was the one used by the colleagues I was working with when developing this bit of survHE
and so that's what I used too. No reason why this can't be expanded to other formats to then plug into digitise
--- but I'm not sure whether that's your issue in the first place?
Yes, I apologise, I wasn't clear. I used digitise but had no issues with the surv_inp argument. The nrisk_inp gave me this error "Error in [.data.frame
(pub.risk, , 5) : undefined columns selected" because I was just using the information on number at risk that I had. But looking at the function, it seems to require a 5 column table like the one I pasted that I could get with the survfit function. But I don't have the number of deaths or number censored, just number at risk, so I didn't know where to get that information without processing the digitized curve, maybe counting the symbols myself. The information I have is in the attached picture.
Thank you!
So just to be clear. You do have the graph you sent, but you're not using any digitising software? I think the process as intended would be the following.
1) obtain the graph with the KM curve (as you have). Ideally you need a table of no. at risk (as you do).
2) use a digitisation software (eg DigitizeIT). This effectively entails you pointing and clicking on the graph several time. The software would reconstruct the values on the curve and generate two files. The first one (say called survival.txt
) is something like this
Time Survival
1 0.00 1.00
2 2.89 1.00
3 2.94 0.98
4 3.02 0.97
5 3.02 0.96
6 3.21 0.96
7 3.43 0.94
8 4.77 0.92
9 4.84 0.91
while the second one (say nrisk.txt
) is something like this
Interval time lower upper nrisk
1 0 1 7 61
2 4 8 14 55
3 8 15 21 49
4 12 22 27 31
5 16 28 31 13
It is those files that you need as input to digitise
so it can recreate the IPD, eg using something like
library(survHE)
digitise(surv_inp="survival.txt",nrisk_inp="nrisk.txt",
km_output="KMdata.txt",ipd_output="IPDdata.txt")
I see the issue now, I was using digitising software, but it was only producing the first file. Will check on that. Apologies for my confusion.
Thank you!
Natalie
No probs. Let me know if all works!
DigitizeIt seems to only produce the csv for the KM curve. Couldn´t find a way to get the second file, again. But I found this article that uses the information I have (also from digitizeIT) and produces the rest from the digitised curve through provided r code. [https://www.sciencedirect.com/science/article/pii/S2352340917301968?]
https://doi.org/10.1016/j.dib.2017.05.005
Hopefully it will work out!
thanks!
So just to be clear. You do have the graph you sent, but you're not using any digitising software? I think the process as intended would be the following.
- obtain the graph with the KM curve (as you have). Ideally you need a table of no. at risk (as you do).
- use a digitisation software (eg DigitizeIT). This effectively entails you pointing and clicking on the graph several time. The software would reconstruct the values on the curve and generate two files. The first one (say called
survival.txt
) is something like thisTime Survival 1 0.00 1.00 2 2.89 1.00 3 2.94 0.98 4 3.02 0.97 5 3.02 0.96 6 3.21 0.96 7 3.43 0.94 8 4.77 0.92 9 4.84 0.91
while the second one (say
nrisk.txt
) is something like thisInterval time lower upper nrisk 1 0 1 7 61 2 4 8 14 55 3 8 15 21 49 4 12 22 27 31 5 16 28 31 13
It is those files that you need as input to
digitise
so it can recreate the IPD, eg using something likelibrary(survHE) digitise(surv_inp="survival.txt",nrisk_inp="nrisk.txt", km_output="KMdata.txt",ipd_output="IPDdata.txt")
Can you please provide the full sample files to try? I did create my files per these directions but still have error messages that come with digitise function "Error in rep(t.S[j], d[j]) : invalid 'times' argument" . Just want to see it work fully with sample data. Thanks.
There's another issue with this problem --- have you checked that the digitised values are consistent with the monotonicity assumption in the survival curves?
So just to be clear. You do have the graph you sent, but you're not using any digitising software? I think the process as intended would be the following.
- obtain the graph with the KM curve (as you have). Ideally you need a table of no. at risk (as you do).
- use a digitisation software (eg DigitizeIT). This effectively entails you pointing and clicking on the graph several time. The software would reconstruct the values on the curve and generate two files. The first one (say called
survival.txt
) is something like thisTime Survival 1 0.00 1.00 2 2.89 1.00 3 2.94 0.98 4 3.02 0.97 5 3.02 0.96 6 3.21 0.96 7 3.43 0.94 8 4.77 0.92 9 4.84 0.91
while the second one (say
nrisk.txt
) is something like thisInterval time lower upper nrisk 1 0 1 7 61 2 4 8 14 55 3 8 15 21 49 4 12 22 27 31 5 16 28 31 13
It is those files that you need as input to
digitise
so it can recreate the IPD, eg using something likelibrary(survHE) digitise(surv_inp="survival.txt",nrisk_inp="nrisk.txt", km_output="KMdata.txt",ipd_output="IPDdata.txt")
Hi Gianluca, I have used the digitising software to obtain data similar to the Data1 below (including "Time" and "Survival") from K-M curve, and "No.at risk", but how can I get the data "lower" and "upper"? like Data2 below, Are data "lower" and "upper" "No.at risk"? I failed to find the answer elsewhere, apologize for my stupid question.
Time Survival
1 0.00 1.00 2 2.89 1.00 3 2.94 0.98 4 3.02 0.97 5 3.02 0.96 6 3.21 0.96 7 3.43 0.94 8 4.77 0.92 9 4.84 0.91
Interval time lower upper nrisk
1 0 1 7 61
2 4 8 14 55
3 8 15 21 49
4 12 22 27 31
5 16 28 31 13
Thanks so much! Koop
The two files with the survival times and the nrisk are the output of digitizeIt
带有生存时间和nrisk的两个文件是数字化它的输出
I got it ,thanks a lot.
I've digitised survival data using digitizeit. But i've had some issues regarding using the digitise function in the survHE package. I've managed to get the formatting for the survival input txt file right, but I'm not quite sure where to get the necessary information to get the 5 necessary columns for the number at risk txt file. Is there a pre-processing step that I'm still missing? Looking at the function it seems that this might be the type of table required: time n.risk n.event n.censor surv upper lower 1 11 138 3 0 0.9782609 1.0000000 0.9542301 2 12 135 1 0 0.9710145 0.9994124 0.9434235 3 13 134 2 0 0.9565217 0.9911586 0.9230952 4 15 132 1 0 0.9492754 0.9866017 0.9133612 5 26 131 1 0 0.9420290 0.9818365 0.9038355 6 30 130 1 0 0.9347826 0.9768989 0.8944820
Thanks and again, apologies if I'm just missing something!
Natalie