choderalab / perses

Experiments with expanded ensembles to explore chemical space
http://perses.readthedocs.io
MIT License
178 stars 50 forks source link

Make solute-only trajectory writing optional #1179

Closed jchodera closed 1 year ago

jchodera commented 1 year ago

Typical calculations generate NetCDF files that take up way too much space.

For example, a typical SARS-CoV-2 Mpro calculation will generate a positions variable in the complex NetCDF file that consumes (18 replicas) (5000 iterations/replica) (10000 atoms) (3 dimensions/atom) (4 bytes/dimension) = 10GB of data.

Instead, we should default to setting atom_selection = None (or perhaps an empty list, depending on what the multistate sampler expects) and suggesting our examples remove atom_selection: not water from the input YAML.

We can still extract some useful analysis out of the checkpoints as needed.

ijpulidos commented 1 year ago

Changing the default value to none in https://github.com/choderalab/perses/blob/4b3facc4c4590da22ef4a98dec514d8324d78d17/perses/app/setup_relative_calculation.py#L587 should do it (mdtraj selection lang). And also make the changes in the examples and template yaml files.

ijpulidos commented 1 year ago

We probably want to delete the atom selection from the examples.

mikemhenry commented 1 year ago

https://github.com/choderalab/perses/pull/1185