PyO3 / pyo3

Rust bindings for the Python interpreter
https://pyo3.rs
Apache License 2.0
12.06k stars 744 forks source link

add pyclass `hash` option #4206

Closed Icxolu closed 4 months ago

Icxolu commented 4 months ago

For more dataclass like usage and similar to #4202, this adds a hash pyclass option to implement the __hash__ slot using the type's Hash implementation.

davidhewitt commented 4 months ago

Actually, one thought - should we make this only available with #[pyclass(frozen)]? Otherwise I can foresee accidents where users mutate data and corrupt their hash maps.

(If they really want a __hash__ on a mutable pyclass, they then just add it manually the old way.)

Icxolu commented 4 months ago

Actually, one thought - should we make this only available with #[pyclass(frozen)]? Otherwise I can foresee accidents where users mutate data and corrupt their hash maps.

Given that it can be implemented trivially if needed, I think that makes sense to take the safe route here đź‘Ť.

On that note, should we also require #[pyclass(eq)] (once we have that) or is that not needed?

davidhewitt commented 4 months ago

On that note, should we also require #[pyclass(eq)] (once we have that) or is that not needed?

Great question. So, quoting the Python docs:

If a class does not define an eq() method it should not define a hash() operation either

I guess that wording is strong enough that we should indeed be encouraging users to define eq if they want hash? So perhaps yes, we should require eq, especially given that this is designed to be a convenience for correct implementations and the #[pymethods] escape hatch for weird cases will remain.

I'm happy to either block this PR on landing eq first or proceed with this PR and add a note in #4207 that we should require this later.

Icxolu commented 4 months ago

I think I might just implement the eq option, so we can merge this in the proper order.

Icxolu commented 4 months ago

Rebased. hash now requires eq in addition to frozen. I've also changed the generated slot the same way as in eq, so that manual __hash__ with the hash option errors.

Stargateur commented 2 months ago

This option is only available for frozen classes to prevent accidental hash changes from mutating the object. If you need an __hash__ implementation for a mutable class, use the manual method from above.

I don't get it, how accidental ??? If I mutate an object I expect the hash change. This seriously limit the use of this automatic derive.

davidhewitt commented 2 months ago

Mutating an object hash in Python is usually a mistake - e.g. because it breaks dictionaries. From the Python docs:

If a class does not define an eq() method it should not define a hash() operation either; if it defines eq() but not hash(), its instances will not be usable as items in hashable collections. If a class defines mutable objects and implements an eq() method, it should not implement hash(), since the implementation of hashable collections requires that a key’s hash value is immutable (if the object’s hash value changes, it will be in the wrong hash bucket).

birkenfeld commented 2 months ago

To add to that, in Rust you can't easily mutate an object while it's in a HashMap (only via interior mutability or unsafe) since it can control that you never get a &mut reference to it.

Python can't offer such a guarantee, and therefore only lets you use immutable objects as keys or set items by default. Of course you can implement your own hash and wreak havoc.

Stargateur commented 2 months ago

Yes of course like in Rust key inside a hashmap should not be modified in a way that change the hash but that a solf error from the user, it's NOT enforced at compile time even in Rust. Well I don't know exactly how python works, but this mean I can't use this derive if I have a rust type that CAN be mutable. My use case is pretty simple I want my Rust struct to be hashable, and I have serialization and deserialisation feature that somehow need to construct dummy value and then set state so they need to be mutable... I blame python pickle for asking user non sense. Anyway, the only requirement is the user should not mutate the key of a hashmap (like in rust) and that for me obvious, but require this at compile time is very annoying making this derive not very useful.

birkenfeld commented 2 months ago

If pickle is the reason why your class needs to be mutable, you should be able to use the __getnewargs__ protocol which will not set state after the object is constructed.

Stargateur commented 2 months ago

it's impossible my object can't be construct without specific parameter that I don't have access anymore after and this doesn't change my main argument, any non trivial struct that could be mutable will not be able to use this derive. Please ignore my specific usecase and just accept this struct also have mutable feature that user will not use when the struct is in a dict.