clj-python / libpython-clj

Python bindings for Clojure
Eclipse Public License 2.0
1.05k stars 68 forks source link

error of unhashable type #259

Open behrica opened 7 months ago

behrica commented 7 months ago

Doing this

(pyreq/require-python 'sklearn.datasets)
(def newsgroups (sklearn.datasets/fetch_20newsgroups :subset "all" :remove (builtins/tuple [ "headers" "footers" "quotes"])))

and teh opening newsgroups i te ccider-inspector gives an error Seems to happen only in cider-inspector ...

user> *e
;; => #error {
 :cause "TypeError: unhashable type: 'numpy.ndarray'\n"
 :via
 [{:type clojure.lang.ExceptionInfo
   :message nil
   :data #:clojure.error{:phase :print-eval-result}
   :at [clojure.main$repl$read_eval_print__9206 invoke "main.clj" 442]}
  {:type java.lang.Exception
   :message "TypeError: unhashable type: 'numpy.ndarray'\n"
   :at [libpython_clj2.python.ffi$check_error_throw invokeStatic "ffi.clj" 707]}]
 :trace
 [[libpython_clj2.python.ffi$check_error_throw invokeStatic "ffi.clj" 707]
  [libpython_clj2.python.ffi$check_error_throw invoke "ffi.clj" 705]
  [libpython_clj2.python.base$hash_code invokeStatic "base.clj" 180]
  [libpython_clj2.python.base$hash_code invokePrim "base.clj" -1]
  [libpython_clj2.python.bridge_as_jvm$generic_pyobject$reify__23789 hashCode "bridge_as_jvm.clj" 231]
  [clojure.lang.Util hasheq "Util.java" 173]
  [clojure.lang.Murmur3 hashOrdered "Murmur3.java" 107]
  [clojure.lang.ASeq hasheq "ASeq.java" 91]
  [clojure.lang.Util dohasheq "Util.java" 177]
  [clojure.lang.Util hasheq "Util.java" 168]
  [clojure.lang.PersistentHashMap hash "PersistentHashMap.java" 120]
  [clojure.lang.PersistentHashMap$TransientHashMap doAssoc "PersistentHashMap.java" 327]
  [clojure.lang.ATransientMap assoc "ATransientMap.java" 64]
  [clojure.lang.PersistentHashMap create "PersistentHashMap.java" 56]
  [clojure.lang.PersistentHashMap create "PersistentHashMap.java" 100]
  [clojure.lang.PersistentArrayMap createHT "PersistentArrayMap.java" 64]
  [clojure.lang.PersistentArrayMap assoc "PersistentArrayMap.java" 258]
  [clojure.lang.PersistentArrayMap assoc "PersistentArrayMap.java" 30]
  [clojure.lang.RT assoc "RT.java" 827]
  [clojure.core$assoc__5481 invokeStatic "core.clj" 193]
  [clojure.core$assoc__5481 invoke "core.clj" 192]
  [clojure.lang.Atom swap "Atom.java" 65]
  [clojure.core$swap_BANG_ invokeStatic "core.clj" 2371]
  [clojure.core$memoize$fn__6946 doInvoke "core.clj" 6388]
  [clojure.lang.RestFn invoke "RestFn.java" 421]
  [orchard.inspect$eval7170$fn__7175$fn__7188 invoke "inspect.clj" 660]
  [clojure.core$group_by$fn__8597 invoke "core.clj" 7224]
  [clojure.core.protocols$fn__8249 invokeStatic "protocols.clj" 168]
  [clojure.core.protocols$fn__8249 invoke "protocols.clj" 124]
  [clojure.core.protocols$fn__8204$G__8199__8213 invoke "protocols.clj" 19]
  [clojure.core.protocols$seq_reduce invokeStatic "protocols.clj" 31]
  [clojure.core.protocols$fn__8236 invokeStatic "protocols.clj" 75]
  [clojure.core.protocols$fn__8236 invoke "protocols.clj" 75]
  [clojure.core.protocols$fn__8178$G__8173__8191 invoke "protocols.clj" 13]
  [clojure.core$reduce invokeStatic "core.clj" 6886]
  [clojure.core$group_by invokeStatic "core.clj" 7214]
  [clojure.core$group_by invoke "core.clj" 7214]
  [orchard.inspect$eval7170$fn__7175 invoke "inspect.clj" 658]
  [clojure.lang.MultiFn invoke "MultiFn.java" 234]
  [orchard.inspect$inspect_render invokeStatic "inspect.clj" 792]
behrica commented 7 months ago

Root cause is this:

(def newsgroups (sklearn.datasets/fetch_20newsgroups :subset "all" :remove (builtins/tuple [ "headers" "footers" "quotes"])))
(.hashCode newsgroups)

failing with:

 Unhandled java.lang.Exception
   TypeError: unhashable type: 'Bunch'
behrica commented 7 months ago

There seems to be the opinion in the java community, that .hashCode implementations should never throw exceptions.

behrica commented 7 months ago

But this seem to be a very special case in Python, where a python type is not hashable. Bunch extends dict, and dict is not hashable in Python.

jjtolton commented 7 months ago

Interesting. I can tell you that the user interface philosophy so far as been:

  1. Default to Clojure idioms, unless adopting Clojure idioms would prevent certain Python behavior -- i.e., automatically casting a Python list to a vector would not allow using .append() style methods to the Python list.
  2. Allow opt-in Python idioms where appropriate, i.e., :bind-ns allows a user to have Python module be bound the a Clojure namespace symbol.

This is the first time I'm aware of that there has been a conflict with a Java idiom. For instance, hash([]) throwing an error is expected Python behavior. I suppose the acceptable solution would be, "allow tools that are expecting Java objects to behave like Java objects have objects that behave like Java objects, but Python code expecting Python objects should have Python objects that behave like Python objects." I'm sure there's a more elegant way to phrase that, and I can already see the conceptual difficulty with figuring out how to approach the problem.

The simple approach would be to patch the hashing behavior for Java, so maybe that's best. I don't think many libpython-clj users would be overly upset that calling hash on an unhashable object would return nil rather than throw an error.

cnuernber commented 6 months ago

Hmm. Or embrace and extend the python dict type somehow to support clojure's algorithm for hashing.

behrica commented 6 months ago

I think return 0 is as well acceptable for a java hashcode impl (better the nil)

behrica commented 6 months ago

There are some reports that Clerk cannot render the "newsgroup" objects, no sure if same reason. https://clojurians.slack.com/archives/CLR5FD4ET/p1701986985003569

behrica commented 6 months ago

Maybe the code here: https://github.com/clj-python/libpython-clj/blob/073a887e9ddb0f74a48aa34b91b013a67ec71401/src/libpython_clj2/python/ffi.clj#L708

should catch TypeError: unhashable type: and not throw, but return 0 instead.

Just to address the point that "non hashable" in python is "expected", while in java its not.

jjtolton commented 6 months ago

Hmm. Or embrace and extend the python dict type somehow to support clojure's algorithm for hashing.

As a hacker, I would love this approach and I think it would fulfill the original intent of introspectable datastructures. Less hacker-inclined devs may fume a bit at the potential implications of sets of mutable dicts and dicts as keys, and other potential footguns. Not sure what the effort would be to make this behavior opt-in or toggle-able.

behrica commented 6 months ago

As a far simpler example, we can take this for discussion:

(hash
 (py/->py-dict {:a 1}))

This should NOT fail in my view, but it does:


1. Unhandled java.lang.Exception
   TypeError: unhashable type: 'dict'

                   ffi.clj:  707  libpython-clj2.python.ffi/check-error-throw
                   ffi.clj:  705  libpython-clj2.python.ffi/check-error-throw
                  base.clj:  180  libpython-clj2.python.base/hash-code
                  base.clj:   -1  libpython-clj2.python.base/hash-code
         bridge_as_jvm.clj:  231  libpython-clj2.python.bridge-as-jvm/generic-python-as-map/reify
                 Util.java:  173  clojure.lang.Util/hasheq
                  core.clj: 5197  clojure.core/hash
                  core.clj: 5190  clojure.core/hash
                      REPL:   45  testLibPy.testLibPy/ev
behrica commented 6 months ago

In my view, in the same way we have a default behavoiur for toString:

(str
 (py/->py-dict {:a 1}))

which should return "a string" for any libpython object, we should return "a number " for any libpython object, when .hashCode is called on it. (similar for equals())

Which algorithm to use to calculate the hashcode is then a less important consideration. (return 0 would be already better then exception)

jjtolton commented 6 months ago

Well this discussion has also inspired me to open a new issue, because analytically I think a very important tool that is currently missing is the equivalent of py->clj and clj->py, analogous to js->clj and clj->js. Then it would be rather straightforward to do the (.hashCode (py->clj dict)). The obvious issue of course is that there is not a 1:1 correspondence, and it may be only marginally more useful than casting to json.

the issue with str, @behrica , is that Python objects are free to implement (or not) their own str implementation, and there would be a lot of unhelpful and borderline random behavior using str as a hashing key.

behrica commented 6 months ago

I am not suggesting to use str as hashing key. In my view, we should "catch" "TypeError: unhashable type", here: https://github.com/clj-python/libpython-clj/blob/073a887e9ddb0f74a48aa34b91b013a67ec71401/src/libpython_clj2/python/ffi.clj#L708

and either: return 0, as hashcode or return id() of the object as hashcode

Both will comply with the hashcode/equals rules of Java, I believe: https://www.baeldung.com/java-equals-hashcode-contracts

behrica commented 6 months ago

Or check here: https://github.com/clj-python/libpython-clj/blob/073a887e9ddb0f74a48aa34b91b013a67ec71401/src/libpython_clj2/python/base.clj#L178

if the python object is hashable: The presents of attribute "hash" can be checked for being "nil"

(->
 (py/->python "")
 (py/get-attr "__hash__"))
;; => #object[tech.v3.datatype.ffi.Pointer 0x5406e4ba "{:address 0x00007EFD4368E8B0 }"]
;;
(->
 (py/->py-dict {:a 1})
 (py/get-attr "__hash__"))
;; => nil
behrica commented 5 months ago

This issue makes Clerk and libpython-clj not work well together. Clerk is based on "value hashing" for caching, and crashes when hash-calculation of a value throws exception.