dms-vep / dms-vep-pipeline

Pipeline for analyzing deep mutational scanning (DMS) of viral entry proteins (VEPs)
Other
2 stars 0 forks source link

Functional score scale #124

Closed Bernadetadad closed 1 year ago

Bernadetadad commented 1 year ago

Is it possible to adjust colour scale in functional score heatmaps that would tolerate outliers a bit better. E.g. in my H5 data the sites with highest functional scores are the ones that have a very high functional score in just one of the libraries but the whole heatmap scale moves to accommodate them.

jbloom commented 1 year ago

I don't totally follow the request here, @Bernadetadad. I am looking at this heatmap: https://dms-vep.github.io/Flu_H5_turkey-Indiana-2022-H5N1_DMS/muteffects_observed_heatmap.html

Right now the mutations with the highest functional scores are things like G412I that have scores of ~2.5, and that is the max of the color scale.

What is the specific feature of interest? Do you want the ability to clip large functional scores a maximum value?

Bernadetadad commented 1 year ago

I think my issue is that most of those high scores are so high because only one of the biological replicates is very high. I know the scale is based on median but I think it still overemphasizes those sites as that one library has significantly higher value. I will have more functional data replicates for H5 soon so maybe that will solve this, but I also wonder if maybe setting max scale based on the lower value replicate for the site with maximum functional score in this case is better.

Screen Shot 2023-02-09 at 11 57 02 AM
jbloom commented 1 year ago

OK, so in this case the option being requested is really basically to have an additional option for summary statistic. Right now we have mean and median, you are asking to also add one which would be the minimum absolute value measurement across libraries. Is that a fair summary?

(By the way, although I know it increases work a lot, this is why three libraries could be desirable at least for functional scores: the median effectively eliminates outliers for n > 2 but not for n = 2, since median and mean are the same for n = 2.)

Bernadetadad commented 1 year ago

yes.

jbloom commented 1 year ago

@Bernadetadad, in this pull request to polyclonal I have added the parameters heatmap_max_fixed and heatmap_min_fixed which you can use to manually fix the color scale. I think this is a better solution than taking minimum values across replicates.

Hopefully in practice this can be resolved mostly with better experiments with less noise or more replicates, but now you will have a solution. Although I recommending fixing the limits sparingly only as needed as doing so without care could lead to distortion of appearance of data.

The changes to polyclonal will probably be in the new version 3.4, and I will close this issue once the new polyclonal is merged into dms-vep-pipeline.