AlexsLemonade / OpenPBTA-analysis

The analysis repository for the Open Pediatric Brain Tumor Atlas Project
Other
99 stars 66 forks source link

Update survival tp53/telomerase analysis #1276

Closed jharenza closed 2 years ago

jharenza commented 2 years ago

Purpose/implementation Section

What scientific question is your analysis addressing?

This is a followup to https://github.com/AlexsLemonade/OpenPBTA-analysis/pull/1208 and https://github.com/AlexsLemonade/OpenPBTA-analysis/pull/1212

What was your approach?

What GitHub issue does your pull request address?

NA

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

Which areas should receive a particularly close look?

I really only updated current code to try to figure out the best way to look at these data and did not do any optimization.

I think that we can include the coxph plot from chunk 7 in Figure 4, and include some of the tabular statistics for interactions/cancer group tp53/telomerase results/interactions within the text.

Perhaps optimization and/or crisp figure generation is something @runjin326 or @sjspielman can work on? Note: @runjin326 is working on optimizing #1264, so it is possible this analysis can piggyback onto the functions being generated there.

Is there anything that you want to discuss further?

Does this workflow make sense?

Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?

yes

Results

What types of results are included (e.g., table, figure)?

tables, KM curve, hazard plots

What is your summary of the results?

Univariate analysis

Multivariate analysis

Within cancer groups:

Reproducibility Checklist

Documentation Checklist

jaclyn-taroni commented 2 years ago

Just adding a link to this comment where interaction was discussed because I keep looking for it and the title for the PR it's on doesn't have a title that makes me think it would be there https://github.com/AlexsLemonade/OpenPBTA-analysis/pull/1216#issuecomment-1011153246

sjspielman commented 2 years ago

@jharenza I will make the publication-ready figures for all the survival notebooks once we have the modules solidified. We can open separate issue(s) there once we have a better sense which survival analysis/es will be included in the manuscript.

In terms of optimizing the code, I'll start with this PR. We might want to have this PR block #1264 depending on what kind of reorganization has to happen here that would be shared with the #1264. After I review this a bit will have a better sense.

sjspielman commented 2 years ago

@jharenza Is there a specific reason the template functions shown in survival-analysis_template.Rmd aren't used? The main difference I can see is that this code used survival years instead of days, but this is easy to get around - is there a specific reason we are using years here instead of days? The best way to approach this updated survival notebook and that coming up in #1264 is to use the existing functions and template as much as possible to ensure overall modularity throughout the analysis.

I'm going to file a PR to your branch that leverages the util/survival_models.R functions and we can go from there!

jharenza commented 2 years ago

Thanks @sjspielman. I think years were used because the x-axis was too unreadable using days, but that can probably also be fixed using fixed spacing in the plot. I'm not entirely sure about the functions in general- could be that those functions were univariate only and I have a vague memory that there might have been some plot rendering issues.

jharenza commented 2 years ago

Just adding a link to this comment where interaction was discussed because I keep looking for it and the title for the PR it's on doesn't have a title that makes me think it would be there https://github.com/AlexsLemonade/OpenPBTA-analysis/pull/1216#issuecomment-1011153246

Thank you for finding that. I knew we discussed but I didn't remember why we didn't look at this. I think that given the question we have - which is do the TP53 and telomerase scores have some combined predictive value (does adding the p53 score change the telomerase prediction) - I think we need to add the interaction terms.

I still need help with is interpretation, though.

sjspielman commented 2 years ago

I still need help with is interpretation, though.

Interpreting these interaction models is not going to be particularly straight-forward, beyond just plainly stating which interactions are significant and what coefficients are. This is not something is easily described beyond pure mathematical terms; I don't have "gut sense" beyond that, as it were.

jharenza commented 2 years ago

Thanks for the updates @sjspielman - I will take a look in more detail tomorrow, but at first glance, I noticed that in the "TP53 score * Telomerase score, for each cancer group." section, the analyses are not subsetting each group (at least I do not think they are doing it properly). The N being plotted is the same for every group (all 548 samples) and all tables and plots are identical. I had this issue before, but will have to go back and check tomorrow how I had to handle it. Survival analysis is pesky!

sjspielman commented 2 years ago

@jaclyn-taroni @jharenza This is now ready for a look. I'm pretty satisfied with this notebook at this stage! Some of the changes I included (in particular a function) can be used in the immune survival notebook I'll get to next, as well.

Summary of the changes I've made:

(quick jump to rendered notebook)

Note also that the data prep is actually pretty different between the immune scores survival analysis and this one, so we do not need to have a shared data prep script.

jharenza commented 2 years ago

this looks good to me now - ready for re-review @jaclyn-taroni !

sjspielman commented 2 years ago

This has passed CI successfully before merging master in, so this PR be merged in before checks are complete.