MobleyLab / benchmarkff

Compare optimized geometries and energies from various force fields with respect to a QM reference.
MIT License
13 stars 8 forks source link

handling outlier molecules in `match_minima.py` #5

Open vtlim opened 4 years ago

vtlim commented 4 years ago

On the full benchmark set, while generating plots with match_minima.py using data read in from pickle file, the memory grows exceedingly high (observed > 60 Gb) and is eventually killed.

I am working on identifying high memory use areas using the memory_profiler package with PYTHONPATH=../ python -m memory_profiler ../match_minima.py -i match.in --cutoff 1.0 --plot --readpickle

vtlim commented 4 years ago

Issue was traced back to molecules with extremely high disparate energies. For example, in this plot (disregarding the RMSD axis) some GAFF energies are exceedingly high -- 3.5e7 kcal/mol. image

This particular case is due to GAFF missing a specific vdW parameter for polar hydrogen atoms leading to overlapping atoms. Additional molecules with this issue are here: image

The solution for this might be to check if any of the FF values compared to the reference method is greater than some cutoff, then skip generating plots for this mol. Cutoff would be arbitrarily defined though, say 1000 kcal/mol?

vtlim commented 3 years ago

A temporary workaround is commented in https://github.com/MobleyLab/benchmarkff/blob/master/03_analysis/match_minima.py#L847-L852