README last updated: Nov 09 2020
Objective: Compare optimized geometries and energies from various force fields with respect to a QM reference.
This repository comprises code to extract molecule datasets from QCArchive, run energy minimizations with various force fields, and analyze the resulting geometries and energies with respect to QM reference data from QCArchive.
See our work in this preprint: Lim et. al.; Benchmark Assessment of Molecular Geometries and Energies from Small Molecule Force Fields. 2020.
Directories in this repo:
01_setup
: Extract molecules from QCArchive, convert to OpenEye mols, and standardize conformers and titles.02_calc
: Run energy minimizations for various force fields.03_analysis
: Analyze output energies and geometries.examples
: See this directory for example results and plots.molecules
: The molecule sets used in our benchmark analyses.tools
: A handful of helpful scripts (align structures for PDF output, find specific moieties, extract conformers by SD tag value).File descriptions:
directory | file | description |
---|---|---|
01_setup |
extract_qcarchive_dataset.ipynb |
write out molecules from a QCArchive database which have completed QM calculations |
01_setup |
combine_conformers.ipynb |
of the molecules from extract_qcarchive_dataset.ipynb , combine conformers that are labeled as different molecules |
02_calc |
minimize_ffs.py |
minimize all molecules in an input SDF file with a specified force field |
03_analysis |
color_by_moiety.py |
generate ddE vs TFD (or RMSD) scatter plots highglighting specific moieties by color |
03_analysis |
compare_ffs.py |
compare FF-minimized molecules on their geometries and energies (no conformer matching) |
03_analysis |
match_minima.py |
similar to compare_ffs of comparing geometries and energies but analyzing RMSD-matched structures |
03_analysis |
probe_parameter.py |
find all molecules in a set that use certain specified parameter(s) |
03_analysis |
reader.py |
reader for molecule sets and text input files called by the other analysis scripts |
03_analysis |
tailed_parameters.py |
identify parameters that may be overrepresented in high RMSD/TFD tails for FFXML force fields |
conda create -n parsley python=3.6 matplotlib numpy seaborn
conda activate parsley
conda install -c openeye -c conda-forge -c omnia rdkit openeye-toolkits qcfractal qcportal openforcefield cmiles openmm
The packages in VTL's conda environment is documented in this repo as parsley.yml
.
extract_qcarchive_dataset.ipynb
.combine_conformers.ipynb
.molextract.py
* from OEChem.cat whole_02_good.sdf whole_03_redosort.sdf > whole_04_combine.sdf
combine_conformers.ipynb
.awk '/SMILES/{getline; print}' whole_05_renew.sdf > whole_05_renew.smi
mols2pdf.py
.molchunk.py
.*minimize_ffs.py
.cat_mols.py
.*cat
or cat_mols.py
.*compare_ffs.py
.
get_by_tag.py
.color_by_moiety.py
.match_minima.py
.tailed_parameters.py
and probe_parameter.py
.*The OEChem scripts referred to above are located here.
molextract.py
molchunk.py
-- VTL modified to use OEAbsCanonicalConfTest
Note: Some of the analysis can take a long time for multiple force fields and many molecules
(e.g. up to 2 hours on compare_ffs.py
or 30-45 min on tailed_parameters.py
).
To explore the analyzed data, adjust plots, etc. without re-analyzing data, you can
input the pickle file written out from the previously run analysis.
See more focused issues in the issue tracker.