Closed jaclyn-taroni closed 4 years ago
From a conversation with @gwaygenomics and @PichaiRaman I have some additional info that may be helpful:
Hi Pichai and Casey,
Probably the easiest way to apply the TP53 classifier is by swapping in PBTA data in this script: https://github.com/marislab/pdx-classification/blob/master/1.apply-classifier.ipynb
This is the analysis done for the PPTC PDX paper.
Thanks! Greg
Updated to add NF1 since that classifier also works well in pediatric data and both can be accomplished at the same time.
Relevant pub for NF1: Way et al. BMC Genomics. 2017.
@kgaonkar6 will work on this!
perhaps important to note distinction here:
Relevant pub for NF1: Way et al. BMC Genomics. 2017.
This demonstrated proof of concept for NF1 loss classification in Glioblastoma specifically.
Initially described in Knijnenburg et al. Cell Reports. 2018.
This trained an TP53 alteration classifier using pancancer data (so many more cancer types than just glioblastoma)
Applied in Rokita et al. bioRxiv. 2019.
This used the Knijnenburg et al classifier for the TP53 analysis. But the NF1 and Ras coefficients were built in the original PanCancer classifier paper. The current PBTA analysis classifier also uses the Knijnenburg coefficients for the TP53 analysis and the original PanCan classifier coefficients in the NF1 analysis.
Edit:
Specified references for TP53 and NF1 classifiers 👀
Once #128 has been completed, we will want to include CNV data when evaluating the results of these classifiers. I will also note that currently there is a very low number of NF1 alterations in the poly-A data so the AUROC results may be a bit misleading (see discussion on #385) and we may want to add confidence intervals to the plot (@cgreene found this package - https://rdrr.io/cran/pROC/man/ci.auc.html).
Addressed via the linked pull requests. Closing in favor of filing an updated analysis ticket if needed.
Hi @jaclyn-taroni Do we need an updated analysis ticket for this analysis which includes CNV data ?
Good idea @kgaonkar6
Scientific goals
Identify samples that have TP53 or NF1 inactivation using gene expression data.
Proposed methods
Required input data
Gene expression data. I believe relative abundance data at the gene-level is what was used to train the classifier. In this cohort that would correspond to:
pbta-gene-expression-rsem-fpkm.polya.rds
pbta-gene-expression-rsem-fpkm.stranded.rds
Proposed timeline
I think this could be accomplished in 2 weeks.
Relevant literature