Open behrica opened 7 months ago
Root cause is this:
(def newsgroups (sklearn.datasets/fetch_20newsgroups :subset "all" :remove (builtins/tuple [ "headers" "footers" "quotes"])))
(.hashCode newsgroups)
failing with:
Unhandled java.lang.Exception
TypeError: unhashable type: 'Bunch'
There seems to be the opinion in the java community, that .hashCode implementations should never throw exceptions.
But this seem to be a very special case in Python, where a python type is not hashable.
Bunch
extends dict
, and dict
is not hashable in Python.
Interesting. I can tell you that the user interface philosophy so far as been:
list
to a vector would not allow using .append()
style methods to the Python list
.:bind-ns
allows a user to have Python module be bound the a Clojure namespace symbol.This is the first time I'm aware of that there has been a conflict with a Java idiom. For instance, hash([])
throwing an error is expected Python behavior. I suppose the acceptable solution would be, "allow tools that are expecting Java objects to behave like Java objects have objects that behave like Java objects, but Python code expecting Python objects should have Python objects that behave like Python objects." I'm sure there's a more elegant way to phrase that, and I can already see the conceptual difficulty with figuring out how to approach the problem.
The simple approach would be to patch the hashing behavior for Java, so maybe that's best. I don't think many libpython-clj users would be overly upset that calling hash on an unhashable object would return nil rather than throw an error.
Hmm. Or embrace and extend the python dict type somehow to support clojure's algorithm for hashing.
I think return 0 is as well acceptable for a java hashcode impl (better the nil)
There are some reports that Clerk cannot render the "newsgroup" objects, no sure if same reason. https://clojurians.slack.com/archives/CLR5FD4ET/p1701986985003569
Maybe the code here: https://github.com/clj-python/libpython-clj/blob/073a887e9ddb0f74a48aa34b91b013a67ec71401/src/libpython_clj2/python/ffi.clj#L708
should catch TypeError: unhashable type:
and not throw, but return 0 instead.
Just to address the point that "non hashable" in python is "expected", while in java its not.
Hmm. Or embrace and extend the python dict type somehow to support clojure's algorithm for hashing.
As a hacker, I would love this approach and I think it would fulfill the original intent of introspectable datastructures. Less hacker-inclined devs may fume a bit at the potential implications of sets of mutable dicts and dicts as keys, and other potential footguns. Not sure what the effort would be to make this behavior opt-in or toggle-able.
As a far simpler example, we can take this for discussion:
(hash
(py/->py-dict {:a 1}))
This should NOT fail in my view, but it does:
1. Unhandled java.lang.Exception
TypeError: unhashable type: 'dict'
ffi.clj: 707 libpython-clj2.python.ffi/check-error-throw
ffi.clj: 705 libpython-clj2.python.ffi/check-error-throw
base.clj: 180 libpython-clj2.python.base/hash-code
base.clj: -1 libpython-clj2.python.base/hash-code
bridge_as_jvm.clj: 231 libpython-clj2.python.bridge-as-jvm/generic-python-as-map/reify
Util.java: 173 clojure.lang.Util/hasheq
core.clj: 5197 clojure.core/hash
core.clj: 5190 clojure.core/hash
REPL: 45 testLibPy.testLibPy/ev
In my view, in the same way we have a default behavoiur for toString:
(str
(py/->py-dict {:a 1}))
which should return "a string" for any libpython object, we should return "a number " for any libpython object, when .hashCode is called on it. (similar for equals())
Which algorithm to use to calculate the hashcode is then a less important consideration. (return 0 would be already better then exception)
Well this discussion has also inspired me to open a new issue, because analytically I think a very important tool that is currently missing is the equivalent of py->clj
and clj->py
, analogous to js->clj
and clj->js
. Then it would be rather straightforward to do the (.hashCode (py->clj dict))
. The obvious issue of course is that there is not a 1:1 correspondence, and it may be only marginally more useful than casting to json.
the issue with str, @behrica , is that Python objects are free to implement (or not) their own str implementation, and there would be a lot of unhelpful and borderline random behavior using str as a hashing key.
I am not suggesting to use str as hashing key. In my view, we should "catch" "TypeError: unhashable type", here: https://github.com/clj-python/libpython-clj/blob/073a887e9ddb0f74a48aa34b91b013a67ec71401/src/libpython_clj2/python/ffi.clj#L708
and either:
return 0
, as hashcode or
return id()
of the object as hashcode
Both will comply with the hashcode/equals rules of Java, I believe: https://www.baeldung.com/java-equals-hashcode-contracts
Or check here: https://github.com/clj-python/libpython-clj/blob/073a887e9ddb0f74a48aa34b91b013a67ec71401/src/libpython_clj2/python/base.clj#L178
if the python object is hashable: The presents of attribute "hash" can be checked for being "nil"
(->
(py/->python "")
(py/get-attr "__hash__"))
;; => #object[tech.v3.datatype.ffi.Pointer 0x5406e4ba "{:address 0x00007EFD4368E8B0 }"]
;;
(->
(py/->py-dict {:a 1})
(py/get-attr "__hash__"))
;; => nil
This issue makes Clerk and libpython-clj not work well together. Clerk is based on "value hashing" for caching, and crashes when hash-calculation of a value throws exception.
Doing this
and teh opening
newsgroups
i te ccider-inspector
gives an error Seems to happen only incider-inspector
...