Closed sjspielman closed 1 year ago
Noting this passed CI through the tp53_nf1_score
module before I merged in master
.
In the spirit of moving https://github.com/AlexsLemonade/OpenPBTA-analysis/issues/1665 along, I'm going to go ahead and update this notebook in this PR to export some of the relevant plots.
how many LFS patients?
I poked around in the histologies file, independently of what you have here, and I believe 9 is correct.
how many LFS patients?
I poked around in the histologies file, independently of what you have here, and I believe 9 is correct.
will check this, but MS states "tumors" (assuming which have RNA-Seq and therefore TP53 scores).
@sjspielman I am not seeing an HTML file for 10-tp53-tumor-purity-threshold.Rmd
@sjspielman I am not seeing an HTML file for
10-tp53-tumor-purity-threshold.Rmd
Oh, I see them in a different folder- can we keep them in the main folder, or do you want to move the actual Rmd files to the threshold folder?
@jharenza I am working on this now
@sjspielman I am not seeing an HTML file for 10-tp53-tumor-purity-threshold.Rmd
This is in the results/tumor-purity-threshold/
directory.
Also note the LFS is ok - there are a few samples per patient in there.
Something I'm finding while wrapping this up is that the shuffled AUC is not at all reproducible. I'm hunting down where some seed was probably not set..
Something I'm finding while wrapping this up is that the shuffled AUC is not at all reproducible. I'm hunting down where some seed was probably not set..
👍
for
Here, it appears that all but 2 tumors have scores >0.5, so we may want to check this aspect as well.
I think I eyeballed this and looking at panel 4D, I only see 7 tumors highlighted, but checking the bs_ids you posted in notebook 10 as hypermutators, I also see this comes from only 6 patients, and not sure if we are clear figure 4D is not an indep sampling - should we revise or clarify this?
> v23 %>%
+ filter(Kids_First_Biospecimen_ID %in% hypermutator_bs_ids) %>%
+ select(Kids_First_Participant_ID, Kids_First_Biospecimen_ID, pathology_diagnosis, tumor_descriptor) %>%
+ arrange(Kids_First_Participant_ID)
# A tibble: 8 × 4
Kids_First_Participant_ID Kids_First_Biospecimen_ID pathology_diagnosis tumor_descriptor
<chr> <chr> <chr> <chr>
1 PT_0SPKM4S8 BS_VW4XN9Y7 High-grade glioma/astrocytoma (WHO grade III/IV) Initial CNS Tumor
2 PT_3CHB9PK5 BS_20TBZG09 High-grade glioma/astrocytoma (WHO grade III/IV) Initial CNS Tumor
3 PT_3CHB9PK5 BS_8AY2GM4G High-grade glioma/astrocytoma (WHO grade III/IV) Progressive
4 PT_EB0D3BXG BS_F0GNWEJJ Neuroblastoma Progressive
5 PT_JNEV57VK BS_85Q5P8GF High-grade glioma/astrocytoma (WHO grade III/IV) Initial CNS Tumor
6 PT_JNEV57VK BS_P0QJ1QAH High-grade glioma/astrocytoma (WHO grade III/IV) Progressive
7 PT_S0Q27J13 BS_P3PF53V8 High-grade glioma/astrocytoma (WHO grade III/IV) Initial CNS Tumor
8 PT_VTM2STE3 BS_02YBZSBY High-grade glioma/astrocytoma (WHO grade III/IV) Progressive
and not sure if we are clear figure 4D is not an indep sampling - should we revise or clarify this?
At this stage, I generally thing clarifying >> revising.. Just make sure this one says "samples" maybe and not "tumors"?
@jaclyn-taroni I might want to loop you back in for the reproducibility issue here. I have been observing that the shuffled AUC is not at all reproducible across runs, and I was hoping to track it down.
I tried re-setting a seed again in the function that actually performs the shuffling: https://github.com/AlexsLemonade/OpenPBTA-analysis/blob/dd675dd681a5a69b94ec753d1009ef259f2405a6/analyses/tp53_nf1_score/utils.py#L54-L61
using two approaches
np.random.seed(123)
- didn't helpnp.random.seed()
code is still legacy supported in this version, see docs: https://numpy.org/doc/1.17/reference/random/generator.html) . I updated the function as follows, but it also didn't help with reproducibility
import numpy as np
rng = np.random.default_rng(123)
return rng.permutation(gene.tolist())
Also didn't help.
For my final run through of the tumor purity pipeline, AUC ended up at 0.49 which is what I've pushed in https://github.com/AlexsLemonade/OpenPBTA-analysis/pull/1670/commits/a0fa06ec9e318ef5135a86fed72e68e86308c19f.
My gut still tells me this is a version vs. code problem, but it's not a quick fix. I don't want this to really hold us up, so perhaps this comment represents the official documentation that there is something funky specifically with reproducibility for shuffled tumor purity AUC (maybe shuffled more generally..).
So I would say: given that the display/calculation & reporting is a single instance of shuffling the gene labels (as I understand it), there are some inherent limitations, and no result you've encountered really materially changes interpretation. I think noting it in the README is appropriate.
Closes #1624
This PR adds a notebook
10-tp53-tumor-purity-threshold.Rmd
to thetp53_nf1_score
module to complete the tumor purity re-analysis.Notable file changes:
03
notebooks are re-rendered here.10-tp53-tumor-purity-threshold.Rmd
and its outputresults/tumor-purity-threshold/10-tp53-tumor-purity-threshold.nb.html
for result comparisonexport
argument with a defaultTRUE
o theplot_roc()
function to be able to show plots in ^ notebook without having to save ROCs to a file. So, I addexport = FALSE
in that notebook.README.md
about this notebookThe notebook itself raises some important manuscript revision points, including:
Tagging in @jaclyn-taroni and @jharenza for some discussion here, since I'm not sure for the last two bullet points whether we have MS typos or if I have some wonky calculations in this notebook?
Once this is merged and we have a game plan for all bullets that need game plans, I'll get those issues filed and moving along in
OpenPBTA-manuscript
.Edit - It's also worth noting the new shuffled AUC is 0.34, which is rather less than 0.5.....
Another edit! - This notebook now also exports plots we can use in #1665. Associated changes were made in https://github.com/AlexsLemonade/OpenPBTA-analysis/pull/1670/commits/b0793c534187ae705ea1e68c4203057df95fbbba and https://github.com/AlexsLemonade/OpenPBTA-analysis/pull/1670/commits/de513ac486be7f751ce148414159b8293248acb7.