QChASM / AaronTools.py

Python tools for automating routine tasks encountered when running quantum chemistry computations.
https://aarontools.readthedocs.io/en/latest/
GNU General Public License v3.0
43 stars 8 forks source link

Hashing and comparison #3

Closed rlaplaza closed 3 years ago

rlaplaza commented 3 years ago

Hello again,

This is more of an open comment. Is there a native way to do comparisons/sorting between Geometry objects?

I think it would be useful to make Geometry objects (as well as Atoms) hashable based on their coordinates, and define an eq method, so that you could do quick comparisons between Geometry(es) using the standard python operators, or use Geometry(es) as dictionary keys to store QM properties. Is this something that sounds reasonable? I could potentially code the snippet if you feel it's coherent design-wise.

Thanks in advance, I don't want to come across as pushy - just an enthusiastic user!

Regards,

ajs99778 commented 3 years ago

I can see how having a dictionary with Geometries as keys could be useful. We did discuss hashing Atoms/Geometries a while ago when I was trying to shoe-horn in python 2 compatibility. I stopped pursuing the python 2-specific stuff, as it was getting out of hand. Regardless of if we can make Geometries hash-able, I think the python guidance is to not implement __hash__ for mutable objects. I'll ping @vingman to see what her thoughts are.

Although Geometries are not hash-able, Atoms have defined comparison and Geometries have defined equality. Atom comparison is mostly dependent on a canonical ranking system (I think it's basically what SMILES uses), with file index and substitutions or ligand mappings used to break ties. Geometry equality is based on number of atoms and RMSD. The RMSD allows for a little wiggle room for numerical precision. It doesn't look like we check elements, though we probably should. I think removing Geometry.__eq__ would cause Geometries to become hash-able again, though it would not be a predictable hash and Geometries would only be equal if they were the same instance.

The options I can think of are:

  1. ignore python's guidance and define Geometry.__hash__
  2. remove Geometry.__eq__, allowing for a random hash to be assigned (I don't know if the hash would be useful for determining equality)
  3. define subclasses of Atom and Geometry that are immutable and have a defined hash that can be used to determine equality and perform comparisons

I think there are reasons we would want to keep Geometry.__eq__ and maintain good coding practices.

rlaplaza commented 3 years ago

Thanks for the quick answer!

I do see the point of mutable objects with hash, but in our particular case I guessed that a hash based on the self coordinates would be robust irrespective of mutation because in the end we care about coordinates - properties, not about any other metadata. I might be wrong tho. In that sense, maybe the hashing could be moved to some Theory-derived object (the calculation per se).

Still, this is more of a comment than a request/bug. As a side note, one thing hashes are nice for is implementing caches to avoid repeating the same QM calculation with a python driver.

Thanks in any case!

ajs99778 commented 3 years ago

We are always looking to make AaronTools useful for its users as much as we can. One other thing I've thought of: making the object immutable after it has been hashed. We wouldn't have to define entirely new subclasses, and you'd get an error if you try to set an attribute after the Geometry is used as a dictionary key.

With respect to not reusing a Theory by accident, would having equality defined for Theory be sufficient? With that, you could use == for comparison:

theory1 = Theory(method="B3LYP", basis="def2-SVP")
theory2 = Theory(method="B3LYP", basis="def2-SVP")
theory1 is theory2 # False
theory1 == theory2 # True 

Currently, both of those would be false.

rlaplaza commented 3 years ago

Just as a FYI, the hashing system you just set up works for me. IDK if its the most pythonic option, but its basically what I had half-implemented as a child class. Very grateful right now!

ajs99778 commented 3 years ago

The latest push has a hashable Geometry subclass (HashaleGeometry, imported from AaronTools.geometry). It's as easy to create as a Geometry, but there's also a Geometry.get_hashable method that returns a HashableGeometry. We may go back to the way it was before, but I wanted to try this too.

ajs99778 commented 3 years ago

I've changed back to the previous implementation. We decided that it was easier to use.