Closed kgaonkar6 closed 3 years ago
After updating to use all copy losses compared in samples with >2 ploidy via https://github.com/AlexsLemonade/OpenPBTA-analysis/pull/1073/commits/96a5d54e7ad701ab9077d701bcc0778d01029650
We now have only the 7 samples that have updated tp53_altered status from using hotspot maf in addition to consensus mafs: | sample_id | Kids_First_Biospecimen_ID_DNA | Kids_First_Biospecimen_ID_RNA | cancer_predispositions_latest | tp53_score | SNV_indel_counts_latest | CNV_loss_counts_latest | SV_counts_latest | HGVSp_Short_latest | CNV_loss_evidence_latest | SV_type_latest | hotspot_latest | activating_latest | tp53_altered_latest |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 7316-1746 | BS_68TZMZH1 | BS_0RQ4P069 | None documented | 0.4755520 | 1 | 0 | 0 | p.Y163C | NA | NA | 1 | 0 | loss |
2 | 7316-2189 | BS_02YBZSBY | BS_HJRTC9JQ | Other inherited conditions NOS | 0.6618956 | 2 | 0 | 0 | p.R306*, p.R273C | NA | NA | 1 | 1 | activated |
3 | 7316-2753 | BS_WDTT7PG2 | BS_YMAJC22S | None documented | 0.6804005 | 1 | 0 | 0 | p.Y236Hfs*8 | NA | NA | 0 | 0 | loss |
4 | 7316-3221 | BS_FK3B5SDH | NA | NA | NA | 1 | 0 | 0 | p.L265P | NA | NA | 1 | 0 | loss |
5 | 7316-3631 | BS_ST3Z2B9B | BS_NGHK9RZP | None documented | 0.1675364 | 2 | 0 | 0 | p.X261_splice, p.X307_splice | NA | NA | 0 | 0 | loss |
6 | 7316-3920 | BS_E0S2Y0TS | NA | None documented | NA | 2 | 0 | 0 | p.X261_splice, p.X307_splice | NA | NA | 0 | 0 | loss |
7 | 7316-901 | BS_1JGQPJH3 | BS_A3QZB9Y2 | None documented | 0.3239602 | 2 | 0 | 0 | p.X187_splice, p.X261_splice | NA | NA | 0 | 0 | loss |
This is not related to the hotspot_maf/ consensus CNV updates... But going through the results, I also found the following condition where the SNV is "activating" but the sample also has a CNV loss . The current tp53_altered status == "activated" is given to any sample_id which has the activating SNV at c("273","248") protein position and does not consider if a CNV loss exists, does this sound ok?
sample_id | Kids_First_Biospecimen_ID_DNA | Kids_First_Biospecimen_ID_RNA | cancer_predispositions | tp53_score | SNV_indel_counts | CNV_loss_counts | SV_counts | HGVSp_Short | CNV_loss_evidence | SV_type | hotspot | activating | tp53_altered | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 7316-3058 | BS_P0QJ1QAH | BS_D29RPBSZ | Other inherited conditions NOS | 0.8367765 | 1 | 1 | 0 | p.R273H | 1 | NA | 1 | 1 | activated |
2 | 7316-388 | BS_823V5X6Z | BS_RX1YTZ7F | None documented | 0.5277632 | 1 | 1 | 0 | p.R248W | 2 | NA | 1 | 1 | activated |
3 | 7316-461 | BS_P4K6WK9Y | BS_TRKH2SPE | None documented | 0.9429763 | 1 | 1 | 0 | p.R273H | 1 | NA | 1 | 1 | activated |
4 | 7316-956 | BS_MWZCP1XW | BS_B9V8RGTA | Other inherited conditions NOS | 0.9611167 | 1 | 1 | 0 | p.R273C | 1 | NA | 1 | 1 | activated |
This is not related to the hotspot_maf/ consensus CNV updates...
But going through the results, I also found the following condition where the SNV is "activating" but the sample also has a CNV loss . The current tp53_altered status == "activated" is given to any sample_id which has the activating SNV at c("273","248") protein position and does not consider if a CNV loss exists, does this sound ok?
| sample_id | Kids_First_Biospecimen_ID_DNA | Kids_First_Biospecimen_ID_RNA | cancer_predispositions | tp53_score | SNV_indel_counts | CNV_loss_counts | SV_counts | HGVSp_Short | CNV_loss_evidence | SV_type | hotspot | activating | tp53_altered
-- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --
7316-3058 | BS_P0QJ1QAH | BS_D29RPBSZ | Other inherited conditions NOS | 0.8367765 | 1 | 1 | 0 | p.R273H | 1 | NA | 1 | 1 | activated
2 | 7316-388 | BS_823V5X6Z | BS_RX1YTZ7F | None documented | 0.5277632 | 1 | 1 | 0 | p.R248W | 2 | NA | 1 | 1 | activated
3 | 7316-461 | BS_P4K6WK9Y | BS_TRKH2SPE | None documented | 0.9429763 | 1 | 1 | 0 | p.R273H | 1 | NA | 1 | 1 | activated
4 | 7316-956 | BS_MWZCP1XW | BS_B9V8RGTA | Other inherited conditions NOS | 0.9611167 | 1 | 1 | 0 | p.R273C | 1 | NA | 1 | 1 | activated
I did notice that last night as well and I am ok with that logic.
After the update to use Freec as default for copy_number we see some changes where the copy_number has mostly changed to 2 for samples with >=3 ploidy which is a loss compared to the ploidy but we are missing them out because of our filter to use 1 or 0 copy loss calls only.
Will you add this analysis and the plot to the
05-tp53-altered-annotation.Rmd
and also update the notes at the top of the notebook to describe this?
Did you mean update code in 03-tp53-cnv-loss-domain.Rmd
, this is the script that gathers the CNV losses and 05-tp53-altered-annotation.Rmd only aggregates all the alterations. I did remove previous documentation of only using <=1 copy number calls as CNV losses since we are now using all losses after reviewing that all copy number states have high inactivation image. I can add specific documentation that this filter was updated because we are now using controlfreec as default instead of cnvkit.
Also, I am not seeing those samples in the latest
tp53_altered_status.tsv
. I'm not seeing that code change for the updated filter, either. I do see the samples removed due to new CN consensus file inloss_overlap_domains_tp53.tsv
.
Could you pull the latest changes in this PR ? I do see the updated copy number and the samples back in tp53_altered_status.tsv
and loss_overlap_domains_tp53.tsv
. For examples BS_2J4FG4HV has Copy number 2 and is gathered as a loss because it's ploidy is 3. Did I miss something?
Finally, we also forgot to add TP53 fusions here as additional evidence. There is only one sample with one:
BS_NJ4WPQVK
and it has a classifier score of 0.81, so we should capture this as a loss as well. Sure I can add in a different PR for the fusion update.
Purpose/implementation Section
What scientific question is your analysis addressing?
Update tp53-nf1-score with hotspots maf + consensus maf and latest consensus CNV file.
What was your approach?
What GitHub issue does your pull request address?
1072
Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.
Which areas should receive a particularly close look?
Should we update the TP53 CNV loss filtering process?
We had previously (cnvkit copy_number as default) only retained TP53 CNV losses with 1 or 0 copies because we had a lot of copy_number ==2 as loss (from neutral calls being assigned 2 copy).
After the update to use Freec as default for copy_number we see some changes where the copy_number has mostly changed to 2 for samples with >=3 ploidy which is a loss compared to the ploidy but we are missing them out because of our filter to use 1 or 0 copy loss calls only. Here's a snippet of the TP53 loss calls that are missed :
This is the distribution of the TP53 loss calls
Is there anything that you want to discuss further?
Can we add
consensus_seg_with_status.tsv
as output in focal-cn-file-preparation module so I don't have to run the script here?Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?
Yes
Results
What types of results are included (e.g., table, figure)?
tables
What is your summary of the results?
tp53_alt_status_change.txt
Reproducibility Checklist
Documentation Checklist
README
and it is up to date.analyses/README.md
and the entry is up to date.