Closed orionarcher closed 2 years ago
Hey @rkingsbury and @hmacdope, I just outlined my current plan for the solvation_analysis
multi-atom rework and I'd love your feedback when you have a chance. Please let me know if anything is unclear or there is anywhere I should expand. This is the last major PR before v0.2 so I'd like to get it right.
Tests are failing right now but due to a relatively silly issue.
Looks awesome @orionarcher, I will need to go through in detail, will try to cover this week. Ping me again if I forget :)
Looks good @orionarcher ; thanks for the well-thought out explanation. Here are a few comments on your proposed API.
Solution
will be renamed toSolute
. All references tosolute
in the current documentation will be renamed tosolvated atom
orsolvated atoms
. I think this better captures what theSolute
class really is, especially as we expand to multi-atomSolute
s.The default initializer for
Solute
will take only a single atom per residue. It will not support multiple identical atoms on a residue. This will be handled by the more general case. As a result, instantiating a Solute for a single atom remains the same. (note that I have already fixed the case with self-solvation identified in issue make solute also be able to be a solvent #31)water = u.select_atoms(...) water_O = u.select_atoms(...) water_O_solute = Solute(water_O, {"water": water})
- Additional initializers will be added to instantiate a
Solute
, these will support multi-atom solutes.The first will allow the user to stitch together multiple solutes to create a new solute.
water_H1 = u.select_atoms(...) water_H2 = u.select_atoms(...) solute_O = Solute(water_O, {"water": water}) solute_H1 = Solute(water_H1, {"water": water}) solute_H2 = Solute(water_H2, {"water": water}) multi_atom_solute = Solute.from_solutes([solute_O, solute_H1, solute_H2]) # maybe this should be a dict?
The second will allow users to simply instantiate a solute from an entire residue (or part of a residue). There may be technical challenges here so this behavior is not guaranteed.
multi_atom_solute = Solute.from_residue(water, {"water": water})
I feel that it might be confusing to have 3 init methods based on the solute type, esp. if 1 is a default Solute(...)
and the other two require from_xxx
methods. It's workable but not ideal. At a minimum, I would say add a Solute.from_atom
that replicates the default __init__
behavior, just so there is consistency (i.e., you can always init a Solute
using a from_xxx
method). Even better in my opinion would be to modify the default __init__
method with type checking so that you can just use Solute(....)
to init the object regardless of the type of residue.
Alternatively, perhaps it would make sense to due a superclass, e.g. MultiAtomSolute(Solute)
and have its __init__
method handle the two cases you enumerate?
In general, I'd say whatever minimizes the number of classes/methods that a user has to become familiar with is the best choice for usability.
I'm not completely sure the "solute_name" column is necessary, but it would be convenient to have.
Yes I think this would be very useful.
- We will retain all of the single atom
Solute
s as a property of the multi-atomSolute
. This would amount to a rough doubling of the memory footprint, but it would make follow up analysis easier. I'm a bit torn here and there may be a better way.
Agreed; since you have to do the analysis anyway it makes sense to retain the data somehow. Note sure about the best way to achieve that. To address memory concerns, you could provide an optional kwarg (Default True
) to retain this data.
>>> print(water.atoms) # what should this be called? >>> [solute_O, solute_H1, solute_H2] # maybe this should be a dict?
What about water.solutes
?
As for the return type, perhaps a dict of {solute_name: Solute
}?
Thanks for the comments!
Even better in my opinion would be to modify the default init method with type checking so that you can just use Solute(....) to init the object regardless of the type of residue.
I quite like this idea. I think this is a much better solution than what I proposed. I'll plan to implement this with three from_x
methods, as you suggested.
Alternatively, perhaps it would make sense to due a superclass, e.g. MultiAtomSolute(Solute) and have its init method handle the two cases you enumerate?
This is definitely something I've been thinking about. Probably the way I'd want to do it would be to have a SingleAtomSolute
as the special case and then have Solute
as the default object that users are working with. SingleAtomSolute
could be a strictly internal representation that is stored in the Solute.atoms
attribute. On one hand, this would allow for a cleaner separation of the "building block" usage of solute and the "full solute" usage of solute. But on the other hand, it might lead to some code duplication and unneeded complexity. For the from_solutes
instantiation, I don't like the idea of users having to first work with a SingleAtomSolute
and then switch to this Solute
object. That alone might tip me towards a single class implementation, but I'd love to hear thoughts.
What about water.solutes? As for the return type, perhaps a dict of {solute_name: Solute}?
I like these both!
I just made several major quality of life changes that will have little impact on the substance of this PR. They will however, make the "Files changed" extremely chaotic. I am going to merge this PR and start a new one focused on the multi-atom solutes issues. I'll link back to this PR to maintain context and copy over the initial post. See PR #72 for the new changes.
Okie dokie, sorry I was a bit slow.
UPDATE:
This PR now implements a number of quality of life changes and solves issue #31. The proposed multi-atom solute changes will be implemented in another PR. See PR #72 for the new changes!
Todos
Notable points that this PR has either accomplished or will accomplish.
Status
Description
This is intended to be a major PR to handle multi-atom solutes. It relates to issues #47, #31, #66, and #58.
I would like to propose the following outline of new functionality. In this description, I will focus on the outward facing API. I'll use the somewhat trivial case of water as an example.
Solution
will be renamed toSolute
. All references tosolute
in the current documentation will be renamed tosolvated atom
orsolvated atoms
. I think this better captures what theSolute
class really is, especially as we expand to multi-atomSolute
s.The default initializer for
Solute
will take only a single atom per residue. It will not support multiple identical atoms on a residue. This will be handled by the more general case. As a result, instantiating a Solute for a single atom remains the same. (note that I have already fixed the case with self-solvation identified in issue #31)Solute
, these will support multi-atom solutes.The first will allow the user to stitch together multiple solutes to create a new solute.
The second will allow users to simply instantiate a solute from an entire residue (or part of a residue). There may be technical challenges here so this behavior is not guaranteed.
To support this, the
solvation_data
dataframe will have two additional columns added, a "residue" column and a "solute_name" column. All analysis classes will be refactored to operate on the "residue" column rather than the "solvated_atom" column. This will make no difference for single-atom solutes but will allow the analysis classes to generalize easily. I'm not completely sure the "solute_name" column is necessary, but it would be convenient to have.When a multi-atom solute is created all of the
solvation_data
dataframes from each constituent single-atom solute will be merged together. The "residue" column will group together solvated atoms on the same residue such that the analysis classes can operate on the whole solute. The API for accessing the residence classes will be identical.Solute
s as a property of the multi-atomSolute
. This would amount to a rough doubling of the memory footprint, but it would make follow up analysis easier. I'm a bit torn here and there may be a better way.For a single atom solute the
atoms
list would still be present but the data within would be identical to thesolvation_data
of the solute itself. Single atom solutes are now just a special case of multi-atom solutes.I'm sure there are many things I am not considering that will come up later, but as a start, I think this plan will allow the package to be generalized with maximum code reuse. I'd love feedback or suggestions on any aspect of the outline above.
Todos
Notable points that this PR has either accomplished or will accomplish.
Status