OpenFreeEnergy / alchemiscale

a high-throughput alchemical free energy execution system for use with HPC, cloud, bare metal, and Folding@Home
http://alchemiscale.org
Other
24 stars 8 forks source link

[user story] COVID Moonshot and ASAP large-scale free energy calculations for synthesis prioritization #5

Open jchodera opened 2 years ago

jchodera commented 2 years ago

In broad terms, what are you trying to do?

The COVID Moonshot and its successor, ASAP, are pursuing open science patent-free antiviral drug discovery projects for the public good. image The process of hit-to-lead and lead optimization involves generating large virtual synthetic libraries where a common intermediate is used to make many analogues using a large library of building blocks from CROs like Enamine, WuXi, and Sai: image Predictions over these large virtual synthetic libraries are used to prioritize compounds for synthesis within both hit-to-lead and lead-optimization phases: image

As input, we would like to incorporate both manually submitted compound designs and enumerated virtual synthetic libraries (example) and automate the workflow of selecting appropriate X-ray structures with related reference ligands from available X-ray structures (where many co-crystal structures are often available), preparing the systems for free energy calculations, building a transformation network that connects reference ligands with designs (including redundancy), executes the relative free energy calculations on Folding@home, performs on-the-fly analysis, and provides up-to-date distilled results in a manner that can be pulled into a dashboard (example from fah-xchem) that can be used by the chemistry design teams to action designs for synthesis. All generated data will be openly archived and usable for a variety of research purposes (methodology improvement, ML, compound design) by anyone.

Future improvements can involve allocating effort adaptively among relative and absolute free energy calculations to enable more efficient execution, as well as using much more efficient single-replica methods like SAMS / Times Square Sampling in OpenMM and/or gromacs.

How do you believe using this project would help you to do this?

The current process of setting up a COVID Moonshot Sprints involves manual execution of a sequence of scripts that prepare the calculations for execution on Folding@home, a significant amount of babysitting the launch of the Folding@home projects, and then a tailored automated analysis script that combines analysis and dashboard generation. Many of these steps are generalizable components for setting up, executing, and analyzing automated free energy calculations

Instead, by factoring all reusable components into conda-installable modules that use common data models and APIs, we can collaboratively build an ecosystem that can be assembled into reusable workflows that can automate, simplify, robustify, streamline, and optimize/improve this process to enable us to scale to support many discovery projects. This work should be synergistic with supporting other similar projects by providing compute capabilities for open discovery or research projects; additional use cases (such as prediction of resistance mutations and XChem Fragalysis) will be added soon.

What problems do you anticipate with using this project to achieve the above?

The most critical step is to identify all the reusable components and modules, clearly define data and object models capable of extensibility (e.g. transformations should support small molecule transformations for one or more protein targets, point mutations, transformations in other phases such as lipids) and clear base APIs that enable innovation in implementations without breaking APIs, as well as ensuring the ecosystem is conda-installable.

Storage requirements should pose less of a problem if we adopt a clear separation between high-value low-storage, mid-value medium-storage, and low-value large-storage categories:

dotsdl commented 2 years ago

Raw notes from story review, shared here for visibility: