JuliaPy / PythonCall.jl

Python and Julia in harmony.
https://juliapy.github.io/PythonCall.jl/stable/
MIT License
763 stars 62 forks source link

More convenient conversion #267

Open tecosaur opened 1 year ago

tecosaur commented 1 year ago

First off, thanks for all the effort you've put into this package! It's been wonderfully useful, and the integration with CondaPkg is just :ok_hand:.

Unfortunately, at the moment, it feels like whenever I want to write functions that work with python, or (even worse) python and julia I have to engage in a somewhat annoying dance with pyconvert.

I can't help but feel that the situation could be much nicer.

On pyconvert and convert

As an example consider if I want to write an MSE function that works with data from python and Julia. At the moment, I'd need to do something like this:

function mse(y, ŷ)
    native_y = if y isa Py
        pyconvert(Vector, y)
    else y end
    native_ŷ = if ŷ isa Py
        pyconvert(Vector, ŷ)
    else ŷ end
    sum((native_y .- native_ŷ).^2)
end

I find myself regularly writing convert on instinct, getting method errors, and feeling tricked! — as that's the muscle memory I have in julia for "I want type X, I have y". From what I can tell, defining Base.convert(T::Type, obj::Py) wouldn't cause any issues, so I'm confused by why pyconvert actually exists?

Furthermore, since convert(::Type{T}, t::T) just returns t, it would allow for equivalent but much simpler code in quite a few places I think. Returning to the earlier example, the mse function could be rewritten as:

mse(y, ŷ) = sum((convert(Vector, y) .- convert(Vector, ŷ).^2)

I imagine this change could be made in a straightforward backwards compatible way like so:

Base.convert(T::Type, obj::Py) = pyconvert(T, obj)

On type information

Currently, Py is a struct a single field and no parameters. I feel like it would be nice to expose python type information. Short of constructing a python type tree, perhaps it would be possible to cheaply encode some type information in a type parameter? Just as an example, say Py was changed to:

mutable struct Py{pytype}
    ptr :: C.PyPtr
    Py(::Val{:new}, ptr::C.PyPtr) = finalizer(py_finalizer, new(ptr))
end

From a quick benchmark, pytype seems very fast to run (10ns on my machine), perhaps it one could use this to populate pytype information? E.g. Py{:list}? There's likely a better way of doing this, it just occurs to me that something along these lines could be helpful. This definitely needs more thought.

On automatic conversion

The fact that PythonCall does not eagerly make best-guess conversions from Python to Julia types has been quite good from a performance perspective, but I feel like there is still room for a convenience function to "just give me X in pure Julia form", say via a pynative function or similar.

It occurs to me that should Py have type annotations, as proposed earlier, then this might be able to be done quite cleanly with multiple dispatch.

pynative(juliatype::Any) = juliatype
pynative(::Py{T}) where {T} = throw(MethodError(pynative, Tuple{Py{T}}))
pynative(list::Py{:list}) = map(pynative, convert(Vector, list))
...

On type promotion

I imagine guessing when it would make sense to convert to Julia would be a bit fraught, so while this would be nice theoretically, I have no particular ideas here.

cjdoris commented 1 year ago

Rest assured the same thoughts have crossed my mind!

Many of your points can be addressed with a single design decision: Python objects are their own type of thing with their own semantics, totally disjoint from any existing Julia semantics. This means for example that the Python int value 1 is not considered to be equal in any Julia sense to the Julia Int value 1.

This already explains your first point: since Python 1 and Julia 1 are not equal, you should not be able to convert between them. On the other hand, pyconvert is explicitly for converting a Python value to its Julia equivalent and has its own documented semantics.

One practical difference is that convert(Any, x) always returns x whereas pyconvert(Any, x) does not. PyCall gets around this by introducing a new type PyAny, but this is a weird hack IMO.

PythonCall did in fact used to have the generic convert method that you suggest, but it has big performance issues because it invalidates pretty much all other methods for convert.

You can of course define your own one-line function which does convert or pyconvert depending on the input.

The same argument rules out type promotion with Julia types - since promote relies on convert.

The above design decision also explains why we don't do automatic conversion back to Julia types - they just aren't the same thing. In particular, automatic conversion gets in the way of doing lots of Pythonic things in a row. For example x.my_list.append(12) would fail if x.my_list were converted to a Vector. A secondary concern is that such behaviour is type-unstable and slow.

I have considered including type information in Py too but again this would be type-unstable and slow. It's also not clear how it would even be useful - to be useful we'd need to know the full inheritance hierarchy of the Python types (so for example we know if something is a subtype of list) but if you want to use this for dispatch it's difficult because Python has multiple inheritance so its types cannot be faithfully represented as Julia types. The alternative is to treat Python type information as a runtime property, which is what PythonCall does.

cjdoris commented 1 year ago

As for converting back to Julia native types, this has been a common request. I am currently rewriting the conversion internals, and one thing it will allow is flags to control the conversion. Soon you will be able to do pyconvert(Any, x; copy=true) to get a copying conversion, so that a list is copied to a Vector instead of wrapped as a PyList.

tecosaur commented 1 year ago

Thanks for the detailed reply. One thing I'm not sure of though is

PythonCall did in fact used to have the generic convert method that you suggest, but it has big performance issues because it invalidates pretty much all other methods for convert.

Would Base.convert(T::Type, obj::Py) = pyconvert(T, obj) actually cause many method invalidations? If you might explain this a bit more that would be appreciated.

cjdoris commented 1 year ago

My memory about the invalidations is hazy - I'm no expert on that stuff. But I followed the SnoopCompile tutorial and at some point found a huge pile of stuff (invalidations maybe) connected to convert. After I removed that method, the pile of stuff disappeared and the package import time went down by about a second.

If you're interested, you could modify PythonCall yourself and investigate the effect.

github-actions[bot] commented 1 year ago

This issue has been marked as stale because it has been open for 30 days with no activity. If the issue is still relevant then please leave a comment, or else it will be closed in 7 days.

github-actions[bot] commented 1 year ago

This issue has been closed because it has been stale for 7 days. You can re-open it if it is still relevant.