capi-workgroup / decisions

Discussion and voting on specific issues
8 stars 1 forks source link

Add PyObject_GenericHash() function #5

Closed serhiy-storchaka closed 10 months ago

serhiy-storchaka commented 11 months ago

Issue: https://github.com/python/cpython/issues/113024 PR: https://github.com/python/cpython/pull/113025

Py_hash_t PyObject_GenericHash(PyObject *obj)

It is the implementation of the default Python object hashing function object.__hash__(). In CPython it is identical to Py_HashPointer(), but has the signature compatible with tp_hash. It can only be used for Python objects, not arbitrary pointers, and it is more portable. It can be used in implementations that have no equivalence between object identity and object address.

PyObject_GenericHash() should be used for calculating the hash of Python object. Py_HashPointer() should be used for pointers not managed by Python object manger (i.e. static data, malloced data or pointers to C functions like wrapperobject.descr).

encukou commented 10 months ago

+1 on the idea. I agree with the direction this is taking us, but I feel that this direction should be noted explicitly: This is similar to API like Py_Is or *PyRef_Dup, which are equivalent to ==/Py_IncRef on CPython, but signal that the author wants “HPy-like" semantics on implementations where they're available. IOW, this is acknowledging that HPy is a good idea :)


For this function specifically, I'd prefer calling it PyObject_HashIdentity. It should be used only for objects that are compared by identity, which doesn't sound too generic. (Other Generic API is, very roughly, functions added automatically to a subclass of a built-in type.)

vstinner commented 10 months ago

+1 to add the function. Using PyObject_GenericHash(obj) instead of Py_HashDouble(obj) should help to make code compatible with PyPy, and other Python implementations where id() doesn't directly return an object address, but something more complicated.

hodgestar commented 10 months ago

+1 on the function too. It looks like a great idea.

Regarding the name, why not just PyObject_Hash?

fangerer commented 10 months ago

For this function specifically, I'd prefer calling it PyObject_HashIdentity.

👍

encukou commented 10 months ago

Regarding the name, why not just PyObject_Hash?

That's a different function -- the equivalent of hash(obj). It already exists.

hodgestar commented 10 months ago

Regarding the name, why not just PyObject_Hash?

That's a different function -- the equivalent of hash(obj). It already exists.

Doh! Thanks for explaining. Now that I understand a bit better, having a function that returns a Py_hash_t equivalent of the identity sounds very useful for C extensions and HPy. It'll give people a simple alternative to using the PyObject pointer value as a hash.

serhiy-storchaka commented 10 months ago

On other hand, this API itself is not very useful without adding a corresponding API for id() other than PyLong_FromVoidPtr() and the reversed API instead of PyLong_AsVoidPtr(). And most importantly without replacing all assignments = and comparisons == with API calls. It is too large breaking change, and I do not see it in a foresight future.

As for a name, we have PyObject_GenericGetAttr(), PyObject_GenericSetAttr(), PyObject_GenericGetDict(), PyObject_GenericSetDict(), PyType_GenericAlloc(), PyType_GenericNew() as precedence.

vstinner commented 10 months ago

And most importantly without replacing all assignments = and comparisons == with API calls.

I added Py_Is(x, y) function. Are you referring to that?

vstinner commented 10 months ago

Ping @iritkatriel and @gvanrossum: what's your opinion on this API?

gvanrossum commented 10 months ago

Looks like we're all in favor now.

vstinner commented 10 months ago

@serhiy-storchaka: Congrats, your API is approved, you can go ahead ;-)