biqqles / dataclassy

A fast and flexible reimplementation of data classes
https://pypi.org/project/dataclassy
Mozilla Public License 2.0
81 stars 9 forks source link

Implementing __hash__ properly #26

Closed biqqles closed 3 years ago

biqqles commented 3 years ago

The current __hash__ implementation is not good. It simply filters the fields of the instance it's being applied to, ignores fields that aren't hashable and creates a tuple which it then hashes. This is slow and also provides no warning if a field that should be hashable is not, potentially causing hash collisions. The fact that no one has complained so far is likely testament to the fact that it's almost always better to write a __hash__ manually anyway. But this should be improved.

There are at least two ways to do this:

biqqles commented 3 years ago

I have merged the new, more flexible, Hint implementation using Union and the new Hashed type into master. On second thought, I don't believe it will affect mypy because I presume it is either naive enough to read external annotations as strings or smart enough to be able to evaluate the __class_getitem__, since that is itself fully type hinted.

These changes will be included in the next bugfix release, v0.7.3. My plan for v0.8 is to include a new __hash__ implementation using Hashed, since this will be a breaking change.