Closed jbloom closed 1 year ago
I think it's maybe more confounded by the fact that the number of mutations per category is very different - Do you think maybe a violin plot with actual dots for each mutation shown would be better (it could be too dense to read)? I guess we could also just explicitly tell somewhere that even though RBD seems to have quite a bit of tolerable delations we actually picked specifically the ones that have been observed (if I recall in some Ukrainian sequences that Ryan found), so we biased for such tolerance.
Color scheme is fine whatever, I have some code for the paper for each domain that takes no time to change in illustrator.
I can add a filter for how many times a category must be observed to show, like this:
But I'm not sure that is better? It is mostly driven by biases in what mutations we picked. I think the main thing this plot would show is: stop codon worse than deletion worse than substitution, and some sense of range of effects. And the other plot did capture stop deletions being worse than RBD / NTD deletions, which is probably real
I played around with point plots, but they don't look good because number of mutations is so different among categories.
What do you think? I think we basically need to go with original plot (or one with n mutations in category filter in this plot) or no distribution plot.
If we included this we could say we only included key deletions, and then maybe have a supplement even that specifically calls out ones that del483.
Lets go with original one, it kind of works with what we want to say in the text and we can say something in the legend (I think we do in the main text already) about biased mutation picking in our libraries (we already call out del483 as tolerated in text, but as always we should add a link to interactive heat map for people to explore).
@Bernadetadad, the histograms are added in this notebook: https://dms-vep.github.io/SARS-CoV-2_XBB.1.5_spike_DMS/notebooks/func_effects_dist.html
@Bernadetadad, here is a draft plot to show for the distribution of mutation effects for Fig 2.
Does this look OK?
It is a bit confounded because the subsets of mutations shown differ among libraries so it is not necessarily representative of tolerance to all mutations in those domains, but could still be useful to show?
Also, is the color code OK or should I change it?