AlexsLemonade / OpenPBTA-analysis

The analysis repository for the Open Pediatric Brain Tumor Atlas Project
Other
100 stars 67 forks source link

Survival analysis of HGG tumor subtypes #1332

Closed jharenza closed 2 years ago

jharenza commented 2 years ago

❗ staggered on #1264 but I cannot change the base to that branch

Purpose/implementation Section

What scientific question is your analysis addressing?

This PR addresses bullet 3 in this comment, in which we would like to perform a univariate analysis of subtype within the HGG tumors. This was previously done in v17, but for only H3 WT and DMG K28, so was quite outdated.

What was your approach?

I renamed the old notebook which did HGG subtype analysis and performed both logrank and cox regression (univariate) for molecular subtype.

What GitHub issue does your pull request address?

NA: bullet 3 in this comment

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

Which areas should receive a particularly close look?

double check everything

Is there anything that you want to discuss further?

I am not sure why the subtypes which are DMG, H3 K28 are not showing up in the results. I ran a double check of the formula manually, and get the same results. There are more samples in this group than in some other groups, so I am baffled. Any ideas @sjspielman ?

Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?

yes

Results

What types of results are included (e.g., table, figure)?

What is your summary of the results?

Here I go screen-shotting again...

Screen Shot 2022-04-20 at 2 24 01 PM

HGG tumor subtypes with K28, TP53 loss have a HR of 1.15 (0.41-1.88), p = 0.002, supporting current knowledge that co-occurrence of TP53 with K28 results in worse prognosis.

Reproducibility Checklist

Documentation Checklist

sjspielman commented 2 years ago

@jharenza thanks for working on this!!

Had a look about the missing level. It's "there"! For the cox regression, it's the reference category. So, every coefficient can be interpreted as relative to the reference.

# Whatever is first is, by default, the reference! Or, you can change the factor levels
levels(factor(metadata$molecular_subtype))
[1] "DMG, H3 K28"                 "DMG, H3 K28, TP53 activated" "DMG, H3 K28, TP53 loss"     
[4] "HGG, H3 wildtype"            "HGG, H3 wildtype, TP53 loss"

To get pairwise comparisons among these, you can use pairwise_survdiff. I went ahead and added a chunk in there so you can pull and see.

jharenza commented 2 years ago

@jharenza thanks for working on this!!

Had a look about the missing level. It's "there"! For the cox regression, it's the reference category. So, every coefficient can be interpreted as relative to the reference.

# Whatever is first is, by default, the reference! Or, you can change the factor levels
levels(factor(metadata$molecular_subtype))
[1] "DMG, H3 K28"                 "DMG, H3 K28, TP53 activated" "DMG, H3 K28, TP53 loss"     
[4] "HGG, H3 wildtype"            "HGG, H3 wildtype, TP53 loss"

To get pairwise comparisons among these, you can use pairwise_survdiff. I went ahead and added a chunk in there so you can pull and see.

Oh, this is perfect. Thank you!

jharenza commented 2 years ago

@sjspielman this is ready now, and this would be the panel H forest plot I was envisioning in number 2 here

jharenza commented 2 years ago

Just another two notes.

  1. I added some code to check for rna-only samples, since they were subtyped. But, they were all non- Initial CNS tumor, so they were not added.
  2. I didn't keep it in, but I did check making the reference HGG, H3 wildtype, and while almost everything is significant in the hazard plot (all K28 and TP53 subtypes), a weird thing was happening in which there was no reference plotted and HGG, H3 wildtype, even though it was the ref, it did not align with HR 1, and showed significance. No idea why, so I just removed this because the more useful comparisons are using H3 K28 as reference.
jharenza commented 2 years ago

@jaclyn-taroni thank you for helping with ungroup()!! @sjspielman this is ready for re-review

jharenza commented 2 years ago

Looks good! Noting to update based on this comment - set up a separate directory for the result files.

done!

sjspielman commented 2 years ago

I had approved this, but realized one additional change needs to be made - we also need to save the data that went into the KM model to make the figure for the paper. So, instead of exporting kap_fit$model to RDS, it now exports kat_fit to RDS.

This previously passed CI right before this update, and this additional line of code I added here was confirmed to run in Docker locally with a fresh environment. Therefore this is being merged before checks are complete.