dms-vep / dms-vep-pipeline

Pipeline for analyzing deep mutational scanning (DMS) of viral entry proteins (VEPs)
Other
2 stars 0 forks source link

add antibody count thresholds for prob escape (becomes version 2.4.0) #145

Closed jbloom closed 1 year ago

jbloom commented 1 year ago

This pull request addresses a point noted by @Bernadetadad in #144.

Namely, for computing prob_escape for antibody escape analyses, previously we just imposed a no-antibody condition count threshold to determine which variants to retain. But there are some variants that have very low no-antibody counts, but very high counts in the antibody selected conditions. These are almost certainly strong escape mutations, and by manually looking at some examples in #144 @Bernadetadad finds evidence they are.

Therefore, we want to change the logic to retaining variants if they have adequate no-antibody counts OR if they have high antibody selection counts.

To do that, this pull request augments the existing prob_escape_min_no_antibody_counts and prob_escape_min_no_antibody_frac parameters by adding two new parameters, prob_escape_min_antibody_counts and prob_escape_min_antibody_frac in config.yaml. Right now the latter two new parameters for the antibody counts are optional and not specifying them (or setting them to null) is equivalent to the old behavior of not having an antibody selection count threshold at all. However, you are recommended to start specifying these in your config.yaml and they may become mandatory in a future version.

To recap meaning and suggested settings:

The logic for keeping variants is that they must pass: (prob_escape_min_no_antibody_counts AND prob_escape_min_no_antibody_frac) OR (prob_escape_min_antibody_counts AND prob_escape_min_antibody_frac)

We also add a prob_escape_uncensored_max value to config.yaml (set to 5 if missing) which is the max value at which we clip uncensored prob escape values, which only affect plotting in the current pipeline. This is needed because with variants potentially retained just on antibody counts there can be uncensored prob escape values as large as infinity.

Note that this change becomes version 2.4.0 of dms-vep-pipeline.