JuliaPy / PythonCall.jl

Python and Julia in harmony.
https://juliapy.github.io/PythonCall.jl/stable/
MIT License
776 stars 63 forks source link

Julia objects are falsely marked as subtypes of most Python abstract base classes. #390

Open LilithHafner opened 11 months ago

LilithHafner commented 11 months ago

Affects: JuliaCall

Describe the bug

An example of this that I ran into is that everything is reported as itterable, even when it cannot be iterated.

This looks pretty bad on the surface:

>>> import collections
>>> isinstance([1,2,3], collections.abc.Iterable)
True
>>> isinstance(1, collections.abc.Iterable)
False
>>> isinstance(sum, collections.abc.Iterable)
False
>>> f = jl.seval("f(x) = x+1")
>>> isinstance(f, collections.abc.Iterable)
True
>>> s = jl.seval(":a")
>>> isinstance(s, collections.abc.Iterable)
True
>>> f
Julia: f (generic function with 1 method)
>>> s
Julia: :a

And indeed it causes bugs when passing non-iterable Julia objects to Python functions which check for iterability and then, if iterable, iterate. For example, this function in PyTorch:

https://github.com/pytorch/pytorch/blob/185e76238d9f634c9c8d6fc6be75aa4fc71e04d3/torch/autograd/gradcheck.py#L78-L87

LilithHafner commented 11 months ago

This is because collections.abc.Iterable is an abstract base class ("abc") which means being an instance of it is defined in terms of having certain methods (analogous to Julia's' hasmethod). In Julia, it would be unidiomatic to define subtyping or even traits in this way because it is common to define generic methods that sometimes fail but are always applicable.

Apparently, this is not the case in python, so by defining the appropriate iteration methods on all wrapped Julia objects, regardless, even, of weather or not they hasmethod those methods, we falsely advertize all Julia objects as members of many abstract base classes.

cjdoris commented 11 months ago

For completeness, the full list of collections.abc types that juliacall.AnyValue is an instance of is Container, Hashable, Iterable, Reversible, Sized, Callable, Collection, Buffer.

This corresponds to the methods __contains__, __hash__, __iter__, __reversed__, __len__ and __buffer__ which we always define, even if they throw for most types.

cjdoris commented 11 months ago

My main observation here is that if you want isinstance(x, Iterable) to work reliably for Julia objects x, then you need a reliable way in Julia to test if an object is iterable, which doesn't exist.

LilithHafner commented 11 months ago

I agree that we can't get isinstance to be right all the time. However, it would help my use case (working with existing Python packages that don't accommodate this weakness) if we were wrong a little less often.

While there is no reliable isinstance or hasmethod for Julia objects, there is an unreliable hasmethod which is a strict improvement and solves the examples in the OP.

julia> hasmethod(iterate, Tuple{Array})
true

julia> hasmethod(iterate, Tuple{typeof(sum)})
false

julia> hasmethod(iterate, Tuple{Symbol})
false

Given a Julia known Julia object, it is possible to wrap it in a zero-runtime-overhead Julia object that defines the appropriate methods so that hasmethod is accurate for that object. If PythonCall/JuliaCall use hasmethod, then this can be used as a workaround. Right now the only workaround I know of is to wrap the Julia object in a Python object which, in my case, results in runtime overhead.

Respecting hasmethod will fix many cases of this bug and enable a zero-overhead workaround in the remaining cases.

cjdoris commented 11 months ago

Honestly there's so much runtime overhead in doing anything with wrapped Julia objects I wouldn't worry about that :)

LilithHafner commented 11 months ago

My usecase is passing the object into a higher order Julia function. AFAICT the wrapper is stripped away and there is zero overhead with respect to pure Julia.

cjdoris commented 11 months ago

Got you. I think a future version will consider any Python object implementing __julia__() or something as a wrapper of the Julia object it returns. Then you can write your own wrappers with whatever interface you like (e.g. not supporting iteration) but which are simply unwrapped by default when passed back to Julia.

cjdoris commented 11 months ago

Having __iter__ present or not depending on the Julia type requires creating different Python types, whereas currently there is essentially only generic one.

I think I'm in favour of keeping the current one-type-for-all approach (in the name of genericness) plus the above __julia__ trait based way to provide custom wrapper types.