Closed sjspielman closed 1 year ago
Since I'll be out for a couple days, just wanted to note where this is heading next. This notebook is still draft-y and not part of this PR, but sharing in case anyone is curious. Link to current notebook Rmd: https://github.com/sjspielman/OpenPBTA-analysis/blob/0a3202f71a73dad7dedec6a82612e0f74edb493d/analyses/tp53_nf1_score/10-tp53-tumor-purity-threshold.Rmd HTML for download: 10-tp53-tumor-purity-threshold.nb.html.zip
Importantly, I uncovered a couple areas where we reported outdated P-values, and 1-2 other small inconsistencies in the MS we should have our eyes on.
This is now ready for another look!
results/tumor-purity-threshold/
along with the notebooks. At first I did this just for notebooks as suggested, but then I decided a bit more organization would be nice since there are many result files. This involved code updates such as:
output_file
argument in rmarkdown::render()
to specify separate HTML outputs. results_dir
points to the right directory, while still making sure polyA results are always read in from the primary results/
directory (since polyA wasn’t invited to the tumor purity party)06
python script, I did have to add some opt parse and function arguments to handle this. I set their defaults to what the main pipeline uses, which was hardcoded in here before these changes.results/tp53_scores_vs_molecular_subtype_Ependymal_tumor.tsv
(and its associated plot plots/tp53_scores_vs_molecular_subtype_Ependymal_tumor.png
) are present because there is actually one fewer molecular subtype at v23
compared to when this module was last run, hence the one fewer subtype in these results.05
notebook that exports several plots of tp53 altered status by cancer predisposition. I “turned this chunk off” for the tumor threshold pipeline as we do not need to create those plots.
Part of #1624
This PR begins the process of running TP53 with tumor purity filtered data. Since a lot of module changes had to be made to generate new results, this PR focuses only on that step. There will be a second PR that adds a final notebook to this module in order to compare these results to the original ones reported in the manuscript (those next steps are started in this branch: https://github.com/sjspielman/OpenPBTA-analysis/tree/tumor_purity-tp53-notebook).
There are a variety of changes that I made here, attempting to keep code as-is as much as possible. I created a new script
run_classifier-tumor-purity-threshold.sh
which runs the relevant scripts in this module to re-generate results with this filtered data. This script specifically calls (and does not call) the following for stranded data only:01-apply-classifier.py
02-qc-rna_expression_score.Rmd
is not run as it does not produce any output that is consumed later.03-tp53-cnv-loss-domain.Rmd
and04-tp53-sv-loss.Rmd
05-tp53-altered-annotation.Rmd
.05-tp53-altered-annotation.Rmd
06-evaluate-classifier.py
07-plot-roc.R
is not run. ROC plots will be separately made in the forthcoming notebook08-compare-molecularsubtypes-tp53scores.R
and09-compare-histologies.R
are not run since they are not really relevant here.This script is documented in the README and also is in CI. After running this through and generating result files that can be analyzed, I also ran the normal pipeline again to ensure notebooks are rendered from the full dataset.
I'll request review once checks pass!