Closed runjin326 closed 2 years ago
I'm going to take the first look at this and then I'll loop in @envest for thoughts on next steps!
@envest - thanks so much for the detailed review! I have made the suggested changes and commented/resolved some of your comments above. I think this is ready for another look! @jaclyn-taroni
CI failure is unrelated to the changes in this PR. I just hit rerun.
I notice that the step testing these changes was after the one that timed out, so in https://github.com/AlexsLemonade/OpenPBTA-analysis/pull/1212/commits/4ad379100f07710967c24a517c9b1feaeab19f9f I moved it up so, if we still have the time out problem, we can at least understand if everything for these changes looks okay!
A summary of the development:
1) Using broad_histology_display
still does not converge - hence I added the oncoprint_group
which seemed to be converging and generated informative results
2) Density plot stratified by oncoprint_group
is also included
Things to discuss for next steps:
1) Do we want to only keep the Initial CNS Tumor
during the histology file filtering step?
2) Does oncoprint_group
make sense? If so, I can then remove broad_histology_display
part. Alternatively, other suggestions for grouping them are welcome.
A summary of the development:
- Using
broad_histology_display
still does not converge - hence I added theoncoprint_group
which seemed to be converging and generated informative results- Density plot stratified by
oncoprint_group
is also included
The addition of oncoprint_group
looks interesting -- according to the model, within the same oncoprint_group
, tp53
and extend
are less important as predictors.
Things to discuss for next steps:
- Do we want to only keep the
Initial CNS Tumor
during the histology file filtering step?- Does
oncoprint_group
make sense? If so, I can then removebroad_histology_display
part. Alternatively, other suggestions for grouping them are welcome.
My feeling on oncoprint_group
is: would this analysis make sense a priori before looking at the data? If yes, then that's something to consider including. Unfortunately I am not up to speed on the biological implications of oncoprint_group
for this project.
With the multivariate models and visualizations in place 👍 , I think it's best I leave it to @jaclyn-taroni to help wrap up / summarize next steps.
To weigh in on the oncoprint_group
discussion, I'm not sure that makes sense here or anywhere outside of the specific purpose it is used for – we expect it to only be used in display for Oncoprints, where individual cancer groups are also displayed and when we have essentially curated lists of genes to display.
@jaclyn-taroni , thanks for the feedback! Yes - I tap into this column since the broad_histology_display
groups are too granular for the multivariate analysis so I am trying to see whether there are even broader terms to use. Since this is not desired, should we just drop it and broad histology and only keep HGAT vs. non-HGAT for our final analysis?
Since this is not desired, should we just drop it and broad histology and only keep HGAT vs. non-HGAT for our final analysis?
Yea I think that sounds good @runjin326, thank you! Those comparisons both seem well-justified to me but only one of them (HGAT vs. non-HGAT) appears to have sufficient data.
@jaclyn-taroni, changes pushed - now the only question would be the sample selection portion.
now the only question would be the sample selection portion.
I can definitely see an argument for sticking with Initial CNS tumor only. In that case, I don't know why we need to use the primary plus list of independent specimens unless there's no other way to consistently pick an Initial CNS Tumor specimen (if there are multiple) across different analyses without using that list.
@jaclyn-taroni, I see your point now! So I modified to not use independent RNA primary-plus list and instead, called distinct(sample_id, .keep_all=TRUE
on the meta-indep
after combining TP53 and telomerase scores. Please check to see whether it looks good now.
@jaclyn-taroni - I have made corresponding changes. Please review!
Thanks @runjin326 - looks good! I'll merge once CI passes.
Purpose/implementation Section
What scientific question is your analysis addressing?
This PR addresses the discussion we have in this PR
What was your approach?
Currently, the notebook only does univariate analysis for the following:
The cox regression cannot be plotted but the pvals were output. And for categorical variables (point 3 and 4), the survival plots were generated.
What GitHub issue does your pull request address?
NA
Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.
Which areas should receive a particularly close look?
I did not use the
survival_analysis
function in the notebook to make things easier - please check to see whether function to fit survival model and generate plots make sense.Is there anything that you want to discuss further?
Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?
Yes.
Results
What types of results are included (e.g., table, figure)?
Plots
Results
What is your summary of the results?
Looks like all 4 models that we looked at generated significant results.
Reproducibility Checklist
Documentation Checklist
README
and it is up to date.analyses/README.md
and the entry is up to date.