bjodah / chempy

⚗ A package useful for chemistry written in Python
BSD 2-Clause "Simplified" License
536 stars 79 forks source link

Add __hash__ method to Substance class #226

Closed DNIIBOY closed 4 months ago

DNIIBOY commented 4 months ago

Allow substances to be used in dictionaries, sets etc. by including a hash method in the class.

jeremyagray commented 4 months ago

Two questions:

  1. Wouldn't the hash need to be computed on all the attributes used in __eq__? Otherwise, two substances differing in only their data would be unequal and hash identically. Tests to check for this should probably be included. The documentation states that equal objects must hash identically and this code seems to do that, but I would also expect substances with equal hashes to be equal and they won't necessarily be without considering the data and composition attributes.
  2. Since tuples are hashable, why not just use return hash(self.attrs)? Your function is pretty much how tuples are hashed anyway, I think (unless I'm missing something...).

Looking through the rest of the classes in chemistry.py, it looks like there are other instances of __eq__ and __hash__ with similar issues, so maybe all of them need to be addressed simultaneously especially regarding which attributes really need to be checked for equality and then use the same attributes for hashing.

DNIIBOY commented 4 months ago
  1. The reason for not including all elements of self.attrs, is that data and composition are both dictionaries, which are unhashable. We could look at converting their types and then hashing? or maybe just hashing the elements within them? The issue here is that dictionaries are supposed to be unsorted, so just changing their type will allow two identical objects to be hashed to different values.
  2. Doing hash(self.attrs) will just hash the strings of self.attrs so the hash will be identical for all objects.

The idea behind only hashing these fields, was that self.name and the others were defined by data and composition, but i'm not sure if this assumption is correct.

bjodah commented 4 months ago

We can hash the "tuple of the sorted items", e.g.: tuple(sorted(some_dict.items())). Another alternative (maybe a separate issue, and not necessarily adressed in this PR), would be to use frozendict from PyPI.

DNIIBOY commented 4 months ago

We can hash the "tuple of the sorted items", e.g.: tuple(sorted(some_dict.items())). Another alternative (maybe a separate issue, and not necessarily adressed in this PR), would be to use frozendict from PyPI.

I agree and made the change, should i fix the spelling error stopping CI as well?

bjodah commented 4 months ago

Thank you.

No need to fix the spelling (only if you want to), the CI config needs some love, but we can address that in another PR.

On Sun, Feb 25, 2024, 17:52 Daniel Nettelfield @.***> wrote:

We can hash the "tuple of the sorted items", e.g.: tuple(sorted(some_dict.items())). Another alternative (maybe a separate issue, and not necessarily adressed in this PR), would be to use frozendict from PyPI.

I agree and made the change, should i fix the spelling error stopping CI as well?

— Reply to this email directly, view it on GitHub https://github.com/bjodah/chempy/pull/226#issuecomment-1962997908, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADWUMHOD7FXLMPFWPURKC3YVNT45AVCNFSM6AAAAABDYIJOGWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNRSHE4TOOJQHA . You are receiving this because you commented.Message ID: @.***>

DNIIBOY commented 4 months ago

All right, i'll leave it for another PR

Gustav2 commented 4 months ago

Whats the status? I really need this for a project, can it be merged soon?

bjodah commented 4 months ago

Thank you @DNIIBOY , much appreciated!