AlexsLemonade / OpenPBTA-analysis

The analysis repository for the Open Pediatric Brain Tumor Atlas Project
Other
99 stars 66 forks source link

Panel 4H - KM curve #1335

Closed jharenza closed 2 years ago

jharenza commented 2 years ago

What needs to be done?

From #1332, create panel 4H for KM curve of HGG subtypes

Who will complete this?

@sjspielman

sjspielman commented 2 years ago

I'm not sure about including this panel, because the upper CI's are all NA and sometimes both are NA. I was reading more about how the survival package does these calculations, it seems that it is just not possible to calculate an upper limit when the events are <50% for the given group.

But, that's not the case here. Instead, we have the opposite case - nearly all samples have an event, and it's for most of the samples. The only grouping with a bounded CI is DMG, H3 K28, TP53 loss, and I expect this is because of it's relatively larger sample size (but 26 is still small!). Also, as I noted here, two of the subtype groups have only N=1 which means there's no variation to model there at all.

The model doesn't reach median survival of 50% because pretty much nobody survives, i.e. lack of variation in the data. I've copied the model output here and added some comments to highlight non-events:

                                                    n events median 0.95LCL 0.95UCL
molecular_subtype=DMG, H3 K28                      13     13    536     355      NA
molecular_subtype=DMG, H3 K28, TP53 activated       7      7    294      18      NA
molecular_subtype=DMG, H3 K28, TP53 loss           26     26    275     244     394
molecular_subtype=HGG, H3 G35                       1      1   2681      NA      NA
molecular_subtype=HGG, H3 wildtype                 12      7    582     375      NA      # 5 NON-EVENTS
molecular_subtype=HGG, H3 wildtype, TP53 activated  3      2   1179    1139      NA      # 1 NON-EVENT
molecular_subtype=HGG, H3 wildtype, TP53 loss       8      8    392     349      NA
molecular_subtype=HGG, IDH, TP53 activated          1      1   1311      NA      NA

Because of the very small sample sizes here and the relative lack of variation which preclude estimating upper bounds (or both lower/upper), I do not think this figure contributes much to the paper beyond what could be added in a results sentence.

jharenza commented 2 years ago

My idea was to include the plot, not any of the stats - this would be for visualization only, to go along with the forest plot discussion.

sjspielman commented 2 years ago

That's fine with me as long as we have a clear discussion, and caption information, contextualizing! One of the plotting issues is that we can't show CIs that are infinite, so we'll have to really contextualize those stats.

jharenza commented 2 years ago

Yes of course, definitely will be clear! We don't have to plot the CIs, typically i haven't seen them plotted unless there are two groups (Even then, not usually) because it's too much for a figure.