Closed jharenza closed 3 years ago
I'm looking into the domains for TP53 for step 2 in biomart+pfam and this is the domains and locations I've found : P53_TAD (PF08563) ,P53 (PF00870) and P53_tetramer( PF07710) but we don't have genomic location info for TAD2 (PF18521) from pfam.
bioMartDataPfam %>% dplyr::filter(hgnc_symbol=="TP53")
hgnc_symbol pfam_id chromosome_name gene_start gene_end strand NAME
1 TP53 PF08563 17 7661779 7687550 -1 P53_TAD
2 TP53 PF00870 17 7661779 7687550 -1 P53
3 TP53 PF07710 17 7661779 7687550 -1 P53_tetramer
4 TP53 PF18521 17 7661779 7687550 -1 <NA>
5 TP53 17 7661779 7687550 -1 <NA>
DESC domain_chr domain_start domain_end
1 P53 transactivation motif 17 7676390 7676582
2 P53 DNA-binding domain 17 7673755 7676387
3 P53 tetramerisation motif 17 7670637 7673573
4 <NA> <NA> NA NA
5 <NA> <NA> NA NA
pfam location/name info was obtained from: http://hgdownload.soe.ucsc.edu/goldenPath/hg38/database/pfamDesc.txt.gz http://hgdownload.soe.ucsc.edu/goldenPath/hg38/database/ucscGenePfam.txt.gz
it also seems like they don't have TAD2 in cbioportal rendering as well:
should we continue with only the 3 domain above or include PF18521 and find the genomic location for the domain?
@kgaonkar6 since this gene is on the reverse strand, the start and end locations are actually reversed and it looks like the PFAM database is calling the TAD domain as one domain instead of two, which may be a recent (last 5+ years) discovery. The TAD domain ends at 7676390
and the DBD starts 3 bp later at 7676387
, so I think it is safe to use the TAD domain here as one domain.
First QC: Perform a correlation between RNA-Seq expression values and TP53 classifier scores. Are these inversely correlated as we would expect?
Since this ticket is open, I 'm adding a comment here about QCing stranded vs polya samples as part of ⤴️ according to comment
Wanted to make a note of this observation (potentially for discussion in the paper).
Within the tp53_nf1_score module, we tested whether patients with Li Fraumeni Syndrome (LFS) have high TP53 classifier scores and with the exception of two patients, all had ver high scores >= 0.70. I wanted to investigate whether these samples had a germline TP53 alteration and whether they truly have been diagnosed with LFS. For the former, @Yiran Guo found the variants below. For the latter, Jenn Mason and Shannon Robbins are going to attempt to track down that information from the sites.
sample_id | Kids_First_Participant_ID | Kids_First_Biospecimen_ID_Tumor_DNA | Kids_First_Biospecimen_ID_RNA | Kids_First_Biospecimen_ID_Normal_DNA | cancer_predispositions | path report | Germline | VAF | link |
---|---|---|---|---|---|---|---|---|---|
7316-2310 | PT_PFP1ZVHD | BS_Z9PKZ4RT | BS_DEHJF4C7 | BS_5FP2H6VW | Li-Fraumeni syndrome | not mentioned | Likely pathogenic NM_000546.6(TP53):c.541C>A (p.Arg181Ser) | 0.5 | ClinVar https://www.ncbi.nlm.nih.gov/clinvar/variation/230764/ |
7316-445 | PT_89XRZBSG | BS_G9MQM1KK | BS_ZD5HN296 | BS_XHT3F34T | Li-Fraumeni syndrome | not mentioned | Pathogenic NM_000546.5(TP53):c.454_466del (p.Pro152fs) | 0.35 | ClinVar https://www.ncbi.nlm.nih.gov/clinvar/variation/231540/ |
Interestingly, both have deleterious germline variants, neither have somatic variants (that we have found), but TP53 scores are very low, indicating functional/non-oncogenic TP53. BS_Z9PKZ4RT and BS_G9MQM1KK also have 2 copies of TP53 and both germline variants are heterozygous. Is the other copy still functional?
Germline.P_LP.PT_89XRZBSG.txt Germline.P_LP.PT_PFP1ZVHD.txt
Update: no germline or somatic SVs for these two tumors.
An additional note on the LFS patients above. The tumor purity is very low in both of these samples, and this may be why we are both missing a second somatic hit in TP53 and seeing poor classification / low scores.
# A tibble: 2 x 2
Kids_First_Biospecimen_ID tumor_fraction
<chr> <dbl>
1 BS_G9MQM1KK 0.165
2 BS_Z9PKZ4RT 0.374
@jharenza we did discuss looking into SV for samples with high TP53 classifier scores and no SNV/CNV should that be another ticket or should we update this one?
@jharenza we did discuss looking into SV for samples with high TP53 classifier scores and no SNV/CNV should that be another ticket or should we update this one?
Let's make two new tickets to 1) annotate SVs as another alteration and 2) look at the samples with high scores that don't have TP53 alterations -for this, we'd look at genes upstream of TP53 and determine if they have alterations, for eg- MDM2 amplification. For 2), I think we may hold off on this for the first submission, but want to capture in ticket.
@jharenza we did discuss looking into SV for samples with high TP53 classifier scores and no SNV/CNV should that be another ticket or should we update this one?
Let's make two new tickets to 1) annotate SVs as another alteration and 2) look at the samples with high scores that don't have TP53 alterations -for this, we'd look at genes upstream of TP53 and determine if they have alterations, for eg- MDM2 amplification. For 2), I think we may hold off on this for the first submission, but want to capture in ticket.
I created a ticket #953 for point 1 in the above comment, please update if I have missed something. Thanks!
This was completed with #841 , #922 , #945
What analysis module should be updated and why?
tp53_nf1_score was never fully completed because of bandwidth.
Since we would like to assess whether or not TP53 alterations are likely functional within #807, it seems like a better place to put this would be within the tp53_nf1_score module.
For reference of where this was left, see this PR comment by @jaclyn-taroni and this comment by @sjspielman and #720
What changes need to be made? Please provide enough detail for another participant to make the update.
Referencing #807, we should assess the likely functionality of TP53 alterations before calling them "altered" or "wildtype" for AUROC assessment of the classifier.
First
QC: Perform a correlation between RNA-Seq expression values and TP53 classifier scores. Are these inversely correlated as we would expect?
Second
Reduce the CNV list by focusing on CNVs which delete one or more of TP53's functional domains.
Something to keep in mind for samples which may have low TP53 scores, but have alterations: TP63 and TP73 are homologues which can be functionally redundant and rarely mutated, so if in tact, these might compensate. In addition, for samples which may have high TP53 scores but no alterations, we can check for MDM2 amplification, TP53's most potent negative regulator. For reference from the above paper:
Third
We can start by annotating
TP53 altered - loss
, if the following conditions are met:cancer_predispositions == "Li-Fraumeni syndrome"
, suggesting there is a germline variant in addition to the somatic variant we observe.cancer_predispositions == "Li-Fraumeni syndrome"
and TP53 classifier score for matched RNA-Seq > 0.5 (or higher cutoff we decide upon later).Fourth
We can annotate
TP53 altered - activated
if a sample contains one of the two TP53 activating mutations R273C and R248W. Reference and reference.Fifth
Either assess and potentially annotate as TP53 altered or perform AUROC on above samples, then assess the below:
What input data should be used? Which data were used in the version being updated?
When do you expect the revised analysis will be completed?
2-2.5 weeks?
Who will complete the updated analysis?
@kgaonkar6, @jharenza will review throughout