orionarcher commented 2 years ago

Description

This PR was split off from PR #70 to better separate quality-of-life changes from API changes. See that PR for some history and discussion.

This is intended to be a major PR to handle multi-atom solutes. It relates to issues #47, #66, and #58.

I would like to propose the following outline of new functionality. In this description, I will focus on the outward facing API. I'll use the somewhat trivial case of water as an example.

Solution will be renamed to Solute. All references to solute in the current documentation will be renamed to solvated atom or solvated atoms. I think this better captures what the Solute class really is, especially as we expand to multi-atom Solutes.
The default initializer for Solute will take only a single atom per residue. It will not support multiple identical atoms on a residue. This will be handled by the more general case. As a result, instantiating a Solute for a single atom remains the same.

water = u.select_atoms(...)
water_O = u.select_atoms(...)
water_O_solute = Solute(water_O, {"water": water})

Additional initializers will be added to instantiate a Solute, these will support multi-atom solutes.

The first will allow the user to stitch together multiple solutes to create a new solute.

water_H1 = u.select_atoms(...)
water_H2 = u.select_atoms(...)
solute_O = Solute(water_O, {"water": water})
solute_H1 = Solute(water_H1, {"water": water})
solute_H2 = Solute(water_H2, {"water": water})

multi_atom_solute = Solute.from_solutes([solute_O, solute_H1, solute_H2])  # maybe this should be a dict?

The second will allow users to simply instantiate a solute from an entire residue (or part of a residue). There may be technical challenges here so this behavior is not guaranteed.

multi_atom_solute = Solute.from_residue(water, {"water": water})

To support this, the solvation_data dataframe will have two additional columns added, a "residue" column and a "solute_name" column. All analysis classes will be refactored to operate on the "residue" column rather than the "solvated_atom" column. This will make no difference for single-atom solutes but will allow the analysis classes to generalize easily. I'm not completely sure the "solute_name" column is necessary, but it would be convenient to have.
When a multi-atom solute is created all of the solvation_data dataframes from each constituent single-atom solute will be merged together. The "residue" column will group together solvated atoms on the same residue such that the analysis classes can operate on the whole solute. The API for accessing the residence classes will be identical.

multi_atom_solute.coordination_number["water"]   # valid property

We will retain all of the single atom Solutes as a property of the multi-atom Solute. This would amount to a rough doubling of the memory footprint, but it would make follow up analysis easier. I'm a bit torn here and there may be a better way.

>>> print(water.atoms)  # what should this be called?
>>> [solute_O, solute_H1, solute_H2]  # maybe this should be a dict?

For a single atom solute the atoms list would still be present but the data within would be identical to the solvation_data of the solute itself. Single atom solutes are now just a special case of multi-atom solutes.

water_O_solute.atoms[0].solvation_data = water_O_solute.solvation_data

I'm sure there are many things I am not considering that will come up later, but as a start, I think this plan will allow the package to be generalized with maximum code reuse. I'd love feedback or suggestions on any aspect of the outline above.

Todos

Notable points that this PR has either accomplished or will accomplish.

[ ] make solutions composable so that multi-atom solutes can be constructed systematically
[ ] put guardrails in place to prevent misuse

Status

[ ] Ready to go

codecov[bot] commented 2 years ago

Codecov Report

Merging #72 (8190698) into main (e25c72c) will not change coverage. The diff coverage is n/a.

:exclamation: Current head 8190698 differs from pull request most recent head 7fcd71e. Consider uploading reports for the commit 7fcd71e to get more accurate results

Additional details and impacted files

review-notebook-app[bot] commented 2 years ago

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

orionarcher commented 2 years ago

Moved to PR #75 because I accidentally merged this.

MDAnalysis / solvation-analysis

Support for multi-atom solutes #72

Description

Todos

Status

Codecov Report