brianhie / viral-mutation

Language modeling of viral evolution
MIT License
135 stars 44 forks source link

Suboptimal representation of benchmarked methods #5

Closed brianhie closed 3 years ago

brianhie commented 3 years ago

Superfluous alphabet characters and different site ranges caused mismatch between in silico DMS and validation data. Also leads to comparison issues with baseline models.

Issue and scope of changes to results are under active investigation.

brianhie commented 3 years ago

After initial investigation, issues were addressed and in summary (and thankfully) do not affect the conclusions of our paper although some updates do need to take place.

Issues were leading to, in particular, suboptimal representation of baseline methods. However, CSCS with model still outperforms benchmarks on all DMS datasets tested across the full range of cutoffs defining an escape mutation:

path49142-6

In plot above, dashed line indicates representative escape cutoff reported in initial paper. CSCS predictive performance increases with escape cutoff stringency and consistently outperforms baseline methods especially at the most stringent antibody selection cutoffs, where the assayed mutations have the strongest experimental evidence of escape potential.

Investigation is ongoing, including updates to original paper.

brianhie commented 3 years ago

Instructions for reproducing new benchmarks have been added to README here: https://github.com/brianhie/viral-mutation#benchmarking-experiments with updates to the data tar ball as well.

brianhie commented 3 years ago

AUCs have been updated at results/escape_results.txt.

Cutoff experiment has also inspired a new analysis showing how increasing the stringency on the experimental evidence of fitness or of (loss of) antibody binding also results in better predictive performance of grammaticality or semantic change, respectively, consistent with our biological hypotheses! This new analysis along with above benchmarking cutoffs analysis and new variant analysis has been added to our postscript at results/HZBB21_Postscript_v0.pdf.

brianhie commented 3 years ago

Paper updated, closing.