dms-vep / dms-vep-pipeline-3

Pipeline for analyzing deep mutational scanning (DMS) of viral entry proteins (VEPs)
Other
2 stars 0 forks source link

enable filtering of measurements that are highly disparate among replicates #78

Closed jbloom closed 1 year ago

jbloom commented 1 year ago

Related to this issue in a data repository, we want to be able to filter when specific measured mutations are highly disparate among replicates.

jbloom commented 1 year ago

In version 3.5.0, it is now possible to add filtering of measurements that have a very large standard deviation among libraries (replicates).

It is optional whether or not you do this, and you may be fine not doing it unless you notice key measurements that seem unreliable and are highly disparate among libraries.

But @Caleb-Carr @bblarsen-sci @caelanradford, would suggest everyone building repos at least look into whether they want to do this by checking if they have what seem to be unreliable measurements with large standard devaitions. Do recall that the best way to be confident in a DMS measurement is if independent libraries agree upon the effect. When you look at this, use your biological intuition to decide if measurement is really unreliable. For instance, functional effects of -4 and -7 are probably really about the same, but ones of +1 and -2 are not.

Here I will walk through how you would do this filtering using @Bernadetadad's XBB.1.5 spike repo as an example.

First, the avg_func_effects and avg_antibody_escape rules now output plots that show the standard deviation among replicates versus the mean for each measured mutation. These plots look like this:

image

Basically, you can see that a small number of mutations are to the far right, meaning they have very large standard deviations among replicates. If you mouseover them, you can see what mutations those are. If you decide that some of them are truly noisy, you can filter away ones with large standard deviations.

For functional effects, this involves making the following additions to avg_func_effects in func_effects_config.yml.

First, it turns out that we only want to compute standard deviation after putting a floor on functional effects, because for instance measurements of -7 and -4 may not indicate much error (both are really negative), but a value of +1.5 and -1.5 do indicate disagreement over effect of mutation. That floor is added by specifying floor_effect_for_std: -2.5 (if you want the floor to be -2.5). (Note that in general you should also probably be visualizing your functional effects with a floor by using something like heatmap_min_fixed: -2.5 because we don't want the plots to be visually dominated by differences between mutations that are all really bad.)

See here for an example.

For antibody escape, there is not a comparable floor parameter, because for antibody escape large negative or positive values meaningful, and we think -2 and -5 mean different things biologically.

Anyway, the heatmaps and lineplots now have a slider to filter by standard deviation. If you want that filtering to be anything other than no filter by default, you should add something like:

addtl_slider_stats:
  times_seen: 3
  effect_std: 1.6
addtl_slider_stats_as_max: [effect_std]  # because we want this slider to filter as a max rather than min

to the plot_kwargs for the config for the averaging.

Additionally, you may want to apply some filtering in summary_config.yml (probably the same filtering as above). That can be done by adding lines like:

le_filters:
  effect_std: 1.6

in summaries_config.yml for the appropriate assay, as here.