This pull request addresses a point noted by @Bernadetadad in #144.
Namely, for computing prob_escape for antibody escape analyses, previously we just imposed a no-antibody condition count threshold to determine which variants to retain. But there are some variants that have very low no-antibody counts, but very high counts in the antibody selected conditions. These are almost certainly strong escape mutations, and by manually looking at some examples in #144 @Bernadetadad finds evidence they are.
Therefore, we want to change the logic to retaining variants if they have adequate no-antibody counts OR if they have high antibody selection counts.
To do that, this pull request augments the existing prob_escape_min_no_antibody_counts and prob_escape_min_no_antibody_frac parameters by adding two new parameters, prob_escape_min_antibody_counts and prob_escape_min_antibody_frac in config.yaml. Right now the latter two new parameters for the antibody counts are optional and not specifying them (or setting them to null) is equivalent to the old behavior of not having an antibody selection count threshold at all. However, you are recommended to start specifying these in your config.yaml and they may become mandatory in a future version.
To recap meaning and suggested settings:
prob_escape_min_no_antibody_counts: require variants to have this many counts in the no-antibody condition. A reasonable value is something like 20 (range 10 to 25, depending on how deeply you sequence).
prob_escape_min_no_antibody_frac: require a variant to be at least this fraction of all of the no-antibody counts. A reasonable value is 0.1 / (number of variants in library), perhaps a bit smaller if your library is highly skewed.
prob_escape_min_antibody_counts: require variants to have this many counts in the antibody condition. A reasonable value is twice prob_escape_min_no_antibody_counts (so something like 40), depending on how deeply you sequence.
prob_escape_min_antibody_frac: require variants to have this many counts in antibody condition. A reasonable value is something like 2 / (number of variants in library), perhaps a bit smaller if your library is highly skewed. But do make substantially larger (say ~20 to 50-fold larger) than prob_escape_min_no_antibody_frac).
The logic for keeping variants is that they must pass:
(prob_escape_min_no_antibody_counts AND prob_escape_min_no_antibody_frac) OR (prob_escape_min_antibody_counts AND prob_escape_min_antibody_frac)
We also add a prob_escape_uncensored_max value to config.yaml (set to 5 if missing) which is the max value at which we clip uncensored prob escape values, which only affect plotting in the current pipeline. This is needed because with variants potentially retained just on antibody counts there can be uncensored prob escape values as large as infinity.
Note that this change becomes version 2.4.0 of dms-vep-pipeline.
This pull request addresses a point noted by @Bernadetadad in #144.
Namely, for computing
prob_escape
for antibody escape analyses, previously we just imposed a no-antibody condition count threshold to determine which variants to retain. But there are some variants that have very low no-antibody counts, but very high counts in the antibody selected conditions. These are almost certainly strong escape mutations, and by manually looking at some examples in #144 @Bernadetadad finds evidence they are.Therefore, we want to change the logic to retaining variants if they have adequate no-antibody counts OR if they have high antibody selection counts.
To do that, this pull request augments the existing
prob_escape_min_no_antibody_counts
andprob_escape_min_no_antibody_frac
parameters by adding two new parameters,prob_escape_min_antibody_counts
andprob_escape_min_antibody_frac
inconfig.yaml
. Right now the latter two new parameters for the antibody counts are optional and not specifying them (or setting them tonull
) is equivalent to the old behavior of not having an antibody selection count threshold at all. However, you are recommended to start specifying these in yourconfig.yaml
and they may become mandatory in a future version.To recap meaning and suggested settings:
prob_escape_min_no_antibody_counts
: require variants to have this many counts in the no-antibody condition. A reasonable value is something like 20 (range 10 to 25, depending on how deeply you sequence).prob_escape_min_no_antibody_frac
: require a variant to be at least this fraction of all of the no-antibody counts. A reasonable value is0.1 / (number of variants in library)
, perhaps a bit smaller if your library is highly skewed.prob_escape_min_antibody_counts
: require variants to have this many counts in the antibody condition. A reasonable value is twiceprob_escape_min_no_antibody_counts
(so something like 40), depending on how deeply you sequence.prob_escape_min_antibody_frac
: require variants to have this many counts in antibody condition. A reasonable value is something like2 / (number of variants in library)
, perhaps a bit smaller if your library is highly skewed. But do make substantially larger (say ~20 to 50-fold larger) thanprob_escape_min_no_antibody_frac
).The logic for keeping variants is that they must pass: (
prob_escape_min_no_antibody_counts
ANDprob_escape_min_no_antibody_frac
) OR (prob_escape_min_antibody_counts
ANDprob_escape_min_antibody_frac
)We also add a
prob_escape_uncensored_max
value toconfig.yaml
(set to 5 if missing) which is the max value at which we clip uncensored prob escape values, which only affect plotting in the current pipeline. This is needed because with variants potentially retained just on antibody counts there can be uncensored prob escape values as large as infinity.Note that this change becomes version 2.4.0 of
dms-vep-pipeline
.