Closed sjspielman closed 2 years ago
@jharenza, I'm wondering if I can get some feedback from you as I start this! In our current figure paneling outline (the google slides), I'm looking at figure 5D which is this figure in the repo.
I don't think we can/should use this figure for the manuscript, because one of the groups ("other") is only N=2, which means the statistical comparison is not reliable. Is there another biological grouping (this is Li-Fraumeni) you might recommend to put here for specific tp53 score comparisons? I can investigate and ensure appropriate sample sizes from there. Otherwise, I suggest we remove D altogether.
Another question that has come up is why, in panels B/C, different groupings were used for visualization and statistical comparison. Specifically, B which plots expression does not consider "other" (the vast majority of points in the data), but C does. When the "other" grouping is included for the expression comparison, p-values change substantially. For just a 2-group activated/loss, P=0.0035. With bonferroni for comparing 3 groups, the P-value jumps to 0.011:
.y. |
group1 |
group2 |
p |
p.adj |
p.format |
p.signif |
method |
---|---|---|---|---|---|---|---|
TP53 | other | loss | 0.003595971 | 0.011 | 0.0036 | ** | Wilcoxon |
TP53 | other | activated | 0.017650775 | 0.053 | 0.0177 | * | Wilcoxon |
TP53 | loss | activated | 0.003519522 | 0.011 | 0.0035 | ** | Wilcoxon |
Leaving a note here for future reference and possible discussion - when 05-tp53-altered-annotation.Rmd
is re-rendered, one of its output files tp53_altered_status.tsv
does have some very minor diffs, which are generally at the last decimal points of long decimals, e.g. 0.23057803555397383
vs 0.23057803555397385
.
Not entirely sure what might have caused this since there doesn't appear to be any stochasticity in the script. Possibly tiny changes in data?
@jharenza, I'm wondering if I can get some feedback from you as I start this! In our current figure paneling outline (the google slides), I'm looking at figure 5D which is this figure in the repo.
I don't think we can/should use this figure for the manuscript, because one of the groups ("other") is only N=2, which means the statistical comparison is not reliable. Is there another biological grouping (this is Li-Fraumeni) you might recommend to put here for specific tp53 score comparisons? I can investigate and ensure appropriate sample sizes from there. Otherwise, I suggest we remove D altogether.
Sorry for the delay on this.
I agree this is a bit tricky because of the N = 2 here, which was why the points are shown. The point of this panel is as a positive control for the classifier predictions of oncogenic TP53. Li Fraumeni Syndrome (LFS) is a cancer predisposition syndrome which is defined by a genetic loss of function variant in TP53. The data chosen for this panel were all samples which were annotated as having this predisposition. Thus, we wanted to see if the classifier predicted these patients to have inactivated (or now we are calling it, oncogenic) TP53 as defined by high scores. The answer was yes, except in the case of two samples, which because of how we annotated "loss", "activated", and "other" got annotated as "other". Here_ are a few other ideas:
Thoughts on this?
My sense is we should be consistent with the comparisons in these plots, and therefore the associated statistics, especially if we want these panels adjacent.
I see what you are saying - I hadn't thought about that since I was not thinking about comparing expression of "other", which is really going to be a mix of loss or wildtype. That is a tricky, unresolved group. But, we can definitely add it to plot C. The main comparison is to ask whether we see higher expression in samples with the activating mutations vs those which have alterations resulting in supposed loss of function. It could become confusing for reviewers/readers, but we can include it.
I think this happens with every rerun of the classifier - @gwaygenomics would have more input.
Looping in @runjin326 here for the correlation plots
I think this happens with every rerun of the classifier
You mean the decimal changes? I'm prepared to accept that!
My sense is we are limited for any sort of Li Fraumeni exploration, unfortunately. My point about N=2 was more about validity of performing statistics - without N>=3 (and of course preferably more), statistics cannot be reliably performed at all. Moreover, there are only 10 samples in PBTA which are annotated with this predisposition, so we are inevitably up against severely imbalanced data for any comparison between LFS and everything else. Among all non-NA predispositions, we have the following, with the vast majority N=914 as not documented.
cancer_predispositions n
<chr> <int>
1 Li-Fraumeni syndrome 10
2 NF-1 47
3 NF-1,Other inherited conditions NOS 3
4 NF-2 3
5 NF-2,Schwannomatosis 12
6 None documented 914
7 Other inherited conditions NOS 37
8 Other inherited conditions NOS,Schwannomatosis 1
9 Schwannomatosis 6
10 Tuberous Sclerosis 12
Maybe a good middle ground is to focus only on samples with known predispositions of some kind, and in the text we can contextualize the 2 low LFS points. A quick version of what I have in mind is attached.
predisp.pdf
I think this happens with every rerun of the classifier
You mean the decimal changes? I'm prepared to accept that!
Yes!
My sense is we are limited for any sort of Li Fraumeni exploration, unfortunately. My point about N=2 was more about validity of performing statistics - without N>=3 (and of course preferably more), statistics cannot be reliably performed at all. Moreover, there are only 10 samples in PBTA which are annotated with this predisposition, so we are inevitably up against severely imbalanced data for any comparison between LFS and everything else. Among all non-NA predispositions, we have the following, with the vast majority N=914 as not documented.
cancer_predispositions n <chr> <int> 1 Li-Fraumeni syndrome 10 2 NF-1 47 3 NF-1,Other inherited conditions NOS 3 4 NF-2 3 5 NF-2,Schwannomatosis 12 6 None documented 914 7 Other inherited conditions NOS 37 8 Other inherited conditions NOS,Schwannomatosis 1 9 Schwannomatosis 6 10 Tuberous Sclerosis 12
Maybe a good middle ground is to focus only on samples with known predispositions of some kind, and in the text we can contextualize the 2 low LFS points. A quick version of what I have in mind is attached.
That plot is fine but also not using stats for the LFS patients is also fine. We just want to use them as a control. We can even simply mention from the supp table of scores that all except two patients had scores >x, validating the classifier in a way other than using somatic variants.
when 05-tp53-altered-annotation.Rmd is re-rendered, one of its output files tp53_altered_status.tsv does have some very minor diffs
Hmm, to me this seems like it's an R version issue with readr
or dplyr
. That .Rmd
file isn't rerunning the classifier, so any diff must be internal to the R script.
That being said, I wouldn't be shocked if rerunning the classifier caused these diffs. Re-training the classifier will certainly cause diffs, and re-applying might as a result of python float-handling differences if versions aren't fixed.
Do you also see changes in the session info @sjspielman? Because it could be a difference between running it in the project container and not doing that.
Unfortunately there is no sessionInfo()
that was originally included in the notebook so I can't say much about the environment this was originally run in.
Something I can do, though, is run the notebook a few times within Docker and see if it changes in between those runs. This will sort of approximate whether the issue is from versioning.
Something I can do, though, is run the notebook a few times within Docker and see if it changes in between those runs. This will sort of approximate whether the issue is from versioning.
That sounds good. I'd also add sessionInfo()
to the notebook as part of whatever PR addresses everything else!
Five different runs within docker have no diffs, so this feels strongly like versioning where current results may have previously been run outside the docker. I'll put in a PR for this shortly.
Closing in favor of #1271 and #1272
The manuscript Figure 5 needs to be finalized. This will entail, at a minimum:
Who will complete the updated analysis?
@sjspielman