Updated analysis: Manuscript Figure 5

AlexsLemonade / OpenPBTA-analysis

The analysis repository for the Open Pediatric Brain Tumor Atlas Project

Other

100 stars 67 forks source link

Updated analysis: Manuscript Figure 5 #1231

Closed sjspielman closed 2 years ago

sjspielman commented 2 years ago

The manuscript Figure 5 needs to be finalized. This will entail, at a minimum:

Using a consistent color scheme for panels A-D
Re-envisioning F-G with updated color scheme and current cancer groups and/or broad histologies
Removing panel H
Re-envisioning panel I to match F-G updates
Creating a new figure for survival analysis

Who will complete the updated analysis?

@sjspielman

sjspielman commented 2 years ago

@jharenza, I'm wondering if I can get some feedback from you as I start this! In our current figure paneling outline (the google slides), I'm looking at figure 5D which is this figure in the repo.

I don't think we can/should use this figure for the manuscript, because one of the groups ("other") is only N=2, which means the statistical comparison is not reliable. Is there another biological grouping (this is Li-Fraumeni) you might recommend to put here for specific tp53 score comparisons? I can investigate and ensure appropriate sample sizes from there. Otherwise, I suggest we remove D altogether.

sjspielman commented 2 years ago

Another question that has come up is why, in panels B/C, different groupings were used for visualization and statistical comparison. Specifically, B which plots expression does not consider "other" (the vast majority of points in the data), but C does. When the "other" grouping is included for the expression comparison, p-values change substantially. For just a 2-group activated/loss, P=0.0035. With bonferroni for comparing 3 groups, the P-value jumps to 0.011:

.y.	group1	group2	p	p.adj	p.format	p.signif	method
TP53	other	loss	0.003595971	0.011	0.0036	**	Wilcoxon
TP53	other	activated	0.017650775	0.053	0.0177	*	Wilcoxon
TP53	loss	activated	0.003519522	0.011	0.0035	**	Wilcoxon

sjspielman commented 2 years ago

Leaving a note here for future reference and possible discussion - when 05-tp53-altered-annotation.Rmd is re-rendered, one of its output files tp53_altered_status.tsv does have some very minor diffs, which are generally at the last decimal points of long decimals, e.g. 0.23057803555397383 vs 0.23057803555397385.

Not entirely sure what might have caused this since there doesn't appear to be any stochasticity in the script. Possibly tiny changes in data?

jharenza commented 2 years ago

@jharenza, I'm wondering if I can get some feedback from you as I start this! In our current figure paneling outline (the google slides), I'm looking at figure 5D which is this figure in the repo.

I don't think we can/should use this figure for the manuscript, because one of the groups ("other") is only N=2, which means the statistical comparison is not reliable. Is there another biological grouping (this is Li-Fraumeni) you might recommend to put here for specific tp53 score comparisons? I can investigate and ensure appropriate sample sizes from there. Otherwise, I suggest we remove D altogether.

Sorry for the delay on this.

I agree this is a bit tricky because of the N = 2 here, which was why the points are shown. The point of this panel is as a positive control for the classifier predictions of oncogenic TP53. Li Fraumeni Syndrome (LFS) is a cancer predisposition syndrome which is defined by a genetic loss of function variant in TP53. The data chosen for this panel were all samples which were annotated as having this predisposition. Thus, we wanted to see if the classifier predicted these patients to have inactivated (or now we are calling it, oncogenic) TP53 as defined by high scores. The answer was yes, except in the case of two samples, which because of how we annotated "loss", "activated", and "other" got annotated as "other". Here_ are a few other ideas:

We could make a density plot of scores for all tumors colored by whether or not they were diagnosed with LFS. In the text, we can state the median score for LFS pts and note that the two samples with low LFS scores were due to issues with tumor purity (turned out they in fact have germline variants, but tumor purity was low).
We could just talk about this in the text as a positive control or make a small table of the results for LFS pts in the loss/other categories.

Thoughts on this?

jharenza commented 2 years ago

My sense is we should be consistent with the comparisons in these plots, and therefore the associated statistics, especially if we want these panels adjacent.

I see what you are saying - I hadn't thought about that since I was not thinking about comparing expression of "other", which is really going to be a mix of loss or wildtype. That is a tricky, unresolved group. But, we can definitely add it to plot C. The main comparison is to ask whether we see higher expression in samples with the activating mutations vs those which have alterations resulting in supposed loss of function. It could become confusing for reviewers/readers, but we can include it.

jharenza commented 2 years ago

I think this happens with every rerun of the classifier - @gwaygenomics would have more input.

jharenza commented 2 years ago

Looping in @runjin326 here for the correlation plots

sjspielman commented 2 years ago

I think this happens with every rerun of the classifier

You mean the decimal changes? I'm prepared to accept that!

sjspielman commented 2 years ago

My sense is we are limited for any sort of Li Fraumeni exploration, unfortunately. My point about N=2 was more about validity of performing statistics - without N>=3 (and of course preferably more), statistics cannot be reliably performed at all. Moreover, there are only 10 samples in PBTA which are annotated with this predisposition, so we are inevitably up against severely imbalanced data for any comparison between LFS and everything else. Among all non-NA predispositions, we have the following, with the vast majority N=914 as not documented.

   cancer_predispositions                             n
   <chr>                                          <int>
 1 Li-Fraumeni syndrome                              10
 2 NF-1                                              47
 3 NF-1,Other inherited conditions NOS                3
 4 NF-2                                               3
 5 NF-2,Schwannomatosis                              12
 6 None documented                                  914
 7 Other inherited conditions NOS                    37
 8 Other inherited conditions NOS,Schwannomatosis     1
 9 Schwannomatosis                                    6
10 Tuberous Sclerosis                                12

Maybe a good middle ground is to focus only on samples with known predispositions of some kind, and in the text we can contextualize the 2 low LFS points. A quick version of what I have in mind is attached.
predisp.pdf

jharenza commented 2 years ago

I think this happens with every rerun of the classifier

You mean the decimal changes? I'm prepared to accept that!

Yes!

jharenza commented 2 years ago

My sense is we are limited for any sort of Li Fraumeni exploration, unfortunately. My point about N=2 was more about validity of performing statistics - without N>=3 (and of course preferably more), statistics cannot be reliably performed at all. Moreover, there are only 10 samples in PBTA which are annotated with this predisposition, so we are inevitably up against severely imbalanced data for any comparison between LFS and everything else. Among all non-NA predispositions, we have the following, with the vast majority N=914 as not documented.
   cancer_predispositions                             n

   <chr>                                          <int>

 1 Li-Fraumeni syndrome                              10

 2 NF-1                                              47

 3 NF-1,Other inherited conditions NOS                3

 4 NF-2                                               3

 5 NF-2,Schwannomatosis                              12

 6 None documented                                  914

 7 Other inherited conditions NOS                    37

 8 Other inherited conditions NOS,Schwannomatosis     1

 9 Schwannomatosis                                    6

10 Tuberous Sclerosis                                12
Maybe a good middle ground is to focus only on samples with known predispositions of some kind, and in the text we can contextualize the 2 low LFS points. A quick version of what I have in mind is attached.

predisp.pdf

That plot is fine but also not using stats for the LFS patients is also fine. We just want to use them as a control. We can even simply mention from the supp table of scores that all except two patients had scores >x, validating the classifier in a way other than using somatic variants.

gwaybio commented 2 years ago

when 05-tp53-altered-annotation.Rmd is re-rendered, one of its output files tp53_altered_status.tsv does have some very minor diffs

Hmm, to me this seems like it's an R version issue with readr or dplyr. That .Rmd file isn't rerunning the classifier, so any diff must be internal to the R script.

That being said, I wouldn't be shocked if rerunning the classifier caused these diffs. Re-training the classifier will certainly cause diffs, and re-applying might as a result of python float-handling differences if versions aren't fixed.

jaclyn-taroni commented 2 years ago

Do you also see changes in the session info @sjspielman? Because it could be a difference between running it in the project container and not doing that.

sjspielman commented 2 years ago

Unfortunately there is no sessionInfo() that was originally included in the notebook so I can't say much about the environment this was originally run in.

sjspielman commented 2 years ago

Something I can do, though, is run the notebook a few times within Docker and see if it changes in between those runs. This will sort of approximate whether the issue is from versioning.

jaclyn-taroni commented 2 years ago

Something I can do, though, is run the notebook a few times within Docker and see if it changes in between those runs. This will sort of approximate whether the issue is from versioning.

That sounds good. I'd also add sessionInfo() to the notebook as part of whatever PR addresses everything else!

sjspielman commented 2 years ago

Five different runs within docker have no diffs, so this feels strongly like versioning where current results may have previously been run outside the docker. I'll put in a PR for this shortly.

sjspielman commented 2 years ago

Closing in favor of #1271 and #1272