clj-python / libpython-clj

Python bindings for Clojure
Eclipse Public License 2.0
1.08k stars 68 forks source link

require-python missing functions with annotation #250

Closed markgdawson closed 1 year ago

markgdawson commented 1 year ago

require-python does not load any functions which are annotated with a parametric list or set in their arguments or return values.

For example, if we have the following functions in a module test_module:

def test_fn1(arg1: int) -> int:
  pass

def test_fn2(arg1: list[int]) -> int:
  pass

and we require this with:

(require-python '[test_module :as tm]`

the function tm/test_fn1 will be available in the current namespace, but not tm/test_fn2.

This happens because require-python first uses datafy to turn the python module into clojure map which is then iterated over to add python functions to the newly-created tm clojure namespace.

However, datafy will ignore function test_fn2 (with warnings) and returns a map which contains test_fn1 but not test_fn2. This happens in this case because during reading the function metadata/annotations in the datafy call, the ->jvm function of the PCopyToJVM protocol is called to convert the function annotation information to jvm objects. The ->jvm function can't handle the annotations for test_fn2 and throws an exception, which causes it to be skipped.

It seems that annotations for parametric types of collections like list[int] and set[int] are of type GenericAlias. This type is not handled correctly by ->jvm. Calls with these types are dispatched to the :default implementation of py-proto/pyobject->jvm. This fails because (surprisingly?) these GenericAlias objects pass PyMapping_Check (here).

The docs suggestthat the presence of __get_item__ is the only criteria required for PyMapping_Check to return 1 (i.e. "pass"). The Python documentation for GenericAlias also suggests that it does indeed implement a __get_item__ method, but only in order to throw an exception "to disallow mistakes".

Unsurprisingly, the subsequent attempts in the :default handler to extract items from this generic type object fail. In the current implementation the GenericAlias object will first fail the check for PyMapping_Items here and then fails the check for PySequence_Check here. The resulting exception bubbles up to the datafy outer loop.

A potential super-simple fix affecting generic types only could be to change the PyMapping_Check here to something like this:

 (and (= 1 (py-ffi/PyMapping_Check pyobj))
           (not= :generic-alias (py-ffi/pyobject-type-kwd pyobj)))

I've tested this and it fixes the loading issues with require-python in this case.

An alternative would be to implement a new method for the py-proto/pyobject->jvm multimethod for dispatch value :generic-alias, which does the same as the final :else in the default handler.

Another (more involved) approach could be to move the cond dispatch logic from the method for :default into the defmulti call and attempt to dispatch directly to relevant methods based on results of python-type along with PyMapping_Check, PySequence_Check and PyMapping_Items.

cnuernber commented 1 year ago

This is a well written issue - very clear and makes complete sense to me. We will be following up on this or either fix you suggested in a PR would be great.

markgdawson commented 1 year ago

Thanks for the kind words @cnuernber. I've created a simple PR. I've gone with the approach of having a list of types to exclude from the default extraction for objects with PyMapping_Check is true. I'd already found another case (Union), with slightly differing behavior (crashes on PyMapping_Items), and presumably there may be more cases of these.

markgdawson commented 1 year ago

Seems to be fixed in 0.025.