JuliaPy / PyCall.jl

Package to call Python functions from the Julia language
MIT License
1.46k stars 187 forks source link

Convert AbstractArrays with `strides` to NumPy arrays #876

Closed mkitti closed 2 years ago

mkitti commented 3 years ago

StridedArrays can currently be converted to NumPy arrays with PyCall.jl. However, other types of AbstractArray also have a stride method. In particular, views such as SubArray, ReshapedArray, ReinterpretArray, and PermutedDimsArray implement stride but are not included in StridedArray.

This pull request:

  1. Widens PyCall.NpyArray to accept any AbstractArray
  2. If the array is immutable, embeds a reference to the parent array in the PyObject.
  3. To support slicing, a new type PySlice is created.
stevengj commented 3 years ago

Julia's PermutedDimsArray and StrideSubArray can transmitted to Python/Numpy without copying by taking advantage of numpy.transpose and Python's slice. The strategy is to create a view of the parent array and then use Numpy to perform similar view based operations on the array as was done in Julia

That shouldn't be necessary. You should be able to create a NumPy array directly from any array with strided data, just by broadening the signature of this method.

mkitti commented 3 years ago

Julia's PermutedDimsArray and StrideSubArray can transmitted to Python/Numpy without copying by taking advantage of numpy.transpose and Python's slice. The strategy is to create a view of the parent array and then use Numpy to perform similar view based operations on the array as was done in Julia

That shouldn't be necessary. You should be able to create a NumPy array directly from any array with strided data, just by broadening the signature of this method.

Great. That will make things easier. I think we will need to modify some additional methods to dispatch to that method.

I'm not sure if I understand the dispatching with the code below in conversions.jl. It looks like we should be able to direct any AbstractArray with the method stride to the code you mention. Currently, such an array gets dispatched to:

https://github.com/JuliaPy/PyCall.jl/blob/318c23f3323d859b37ab4682339c03cb339e9135/src/conversions.jl#L313-L333

Rather this needs to be directed to the NpyArray method you mentioned.

https://github.com/JuliaPy/PyCall.jl/blob/318c23f3323d859b37ab4682339c03cb339e9135/src/numpy.jl#L175-L195

Perhaps we should dispatch using a Holy trait:

Basically, this would mean that PyCall would try to convert any AbstractArray that has the method strided to a NumPy array rather than relying on StridedArray to dispatch.

Is changing PyObject dispatch in this way what you had in mind?

stevengj commented 3 years ago

I'm not sure we want to introduce a trait. Just changing this signature to PyObject(a::AbstractArray{T}) where T<:PYARR_TYPES (and similarly for NpyArray, which should probably be npyarray) should suffice to duck-type it — any type that doesn't have strides and a pointer conversion will fall back to array2py.

codecov-io commented 3 years ago

Codecov Report

Merging #876 (d04dc61) into master (ead312f) will increase coverage by 0.33%. The diff coverage is 80.64%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #876      +/-   ##
==========================================
+ Coverage   67.63%   67.97%   +0.33%     
==========================================
  Files          20       20              
  Lines        1965     1992      +27     
==========================================
+ Hits         1329     1354      +25     
- Misses        636      638       +2     
Flag Coverage Δ
unittests 67.97% <80.64%> (+0.33%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/PyCall.jl 69.56% <ø> (ø)
src/gc.jl 85.00% <62.50%> (-15.00%) :arrow_down:
src/conversions.jl 64.33% <84.21%> (+0.95%) :arrow_up:
src/numpy.jl 77.33% <100.00%> (+2.66%) :arrow_up:
src/pyinit.jl 82.82% <100.00%> (+0.17%) :arrow_up:
src/pyarray.jl 71.23% <0.00%> (+0.68%) :arrow_up:
src/pyoperators.jl 87.50% <0.00%> (+4.16%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update ead312f...d04dc61. Read the comment docs.

mkitti commented 3 years ago

I was trying to sort out the Core.TypeofVararg (see https://github.com/JuliaLang/julia/pull/38136 ) issue with the test on line 53. Turns out that the AOT CI script needing an overhaul now that some issues have been resolved. Previously it was always testing master.

mkitti commented 3 years ago

@stevengj Let me know if you would like to split the slice and PySlice code into another PR. I could also rebase this if you would like.

mkitti commented 3 years ago

I removed the slice code from this pull request. If we're interested, I can submit that as a distinct pull request. The diff is now smaller.

PhilipVinc commented 3 years ago

What’s the status on this? I would find it useful, so I would be willing to commit some time if necessary.

mkitti commented 3 years ago

Hi @stevengj , I was going through my old PRs, and I was wonder where we were on this pull request to expand no-copy conversion of Julia arrays to NumPy arrays? Is there anything you would like to change before merging?

PhilipVinc commented 3 years ago

@stevengj bump

mkitti commented 3 years ago

For those who need this sooner rather than later, here's a version I created for a Discourse post:

julia> using PyCall

julia> A = rand(Int, 256);

julia> pytypeof( PyObject(reinterpret(UInt64, A)) )
PyObject <class 'list'>

julia> pytypeof( PyObject(@view(A[1:100])) )
PyObject <class 'list'>

julia> module PyCallHack
           import PyCall: NpyArray, PYARR_TYPES, @npyinitialize, npy_api, npy_type
           import PyCall: @pycheck, NPY_ARRAY_ALIGNED, NPY_ARRAY_WRITEABLE, pyembed
           import PyCall: PyObject, PyPtr
           const HACKED_ARRAYS = Union{SubArray{T}, Base.ReinterpretArray{T}, Base.ReshapedArray{T}, Base.PermutedDimsArray{T}} where T <: PYARR_TYPES
           function NpyArray(a::HACKED_ARRAYS{T}, revdims::Bool) where T <: PYARR_TYPES
               @npyinitialize
               size_a = revdims ? reverse(size(a)) : size(a)
               strides_a = revdims ? reverse(strides(a)) : strides(a)
               p = @pycheck ccall(npy_api[:PyArray_New], PyPtr,
                   (PyPtr,Cint,Ptr{Int},Cint, Ptr{Int},Ptr{T}, Cint,Cint,PyPtr),
                   npy_api[:PyArray_Type],
                   ndims(a), Int[size_a...], npy_type(T),
                   Int[strides_a...] * sizeof(eltype(a)), a, sizeof(eltype(a)),
                   NPY_ARRAY_ALIGNED | NPY_ARRAY_WRITEABLE,
                   C_NULL)
              return PyObject(p, a)
           end
           pyembed(po::PyObject, jo::HACKED_ARRAYS) = pyembed(po, jo.parent)
       end
Main.PyCallHack

julia> pytypeof( PyObject(reinterpret(UInt64, A)) )
PyObject <class 'numpy.ndarray'>

julia> pytypeof( PyObject(@view(A[1:100])) )
PyObject <class 'numpy.ndarray'>
PhilipVinc commented 3 years ago

@stevengj bump

mkitti commented 3 years ago

I've initiated the registration of https://github.com/mkitti/NumPyArrays.jl which brings most of the functionality of this PR to a new type, NumPyArray. NumPyArray is just a wrapper of PyArray but has a constructor for an AbstractArray.

julia> using PyCall, NumPyArrays

julia> rA = reinterpret(Int8, rand(UInt8, 4,4))
4×4 reinterpret(Int8, ::Matrix{UInt8}):
 115   -4  -20   -32
 -43   13  -62   -36
 -84  -37   91  -109
 114  -40    9    73

julia> pytypeof(PyObject(rA))
PyObject <class 'list'>

julia> nprA = NumPyArray(rA)
4×4 NumPyArray{Int8, 2}:
 115   -4  -20   -32
 -43   13  -62   -36
 -84  -37   91  -109
 114  -40    9    73

julia> pytypeof(nprA)
PyObject <class 'numpy.ndarray'>

julia> sum(nprA)
-52

julia> pointer(rA) == pointer(nprA)
true

julia> np = pyimport("numpy")
PyObject <module 'numpy' from 'C:\\Users\\kittisopikulm\\.julia\\conda\\3\\lib\\site-packages\\numpy\\__init__.py'>

julia> np.sum(nprA)
-52

julia> PyObject(rA).dtype
ERROR: KeyError: key :dtype not found
Stacktrace:
 [1] __getproperty(o::PyObject, s::Symbol)
   @ PyCall ~\.julia\packages\PyCall\BD546\src\PyCall.jl:307
 [2] getproperty(o::PyObject, s::Symbol)
   @ PyCall ~\.julia\packages\PyCall\BD546\src\PyCall.jl:312
 [3] top-level scope
   @ REPL[33]:1

julia> PyObject(nprA).dtype
PyObject dtype('int8')

julia> py"""
       def add_one(A):
           return A + 1

       """

julia> py_add_one = py"add_one"
PyObject <function add_one at 0x00000000014BFF70>

julia> py_add_one(nprA)
4×4 Matrix{Int8}:
 116   -3  -19   -31
 -42   14  -61   -35
 -83  -36   92  -108
 115  -39   10    74

julia> py_add_one(rA)
ERROR: PyError ...

julia> py_add_one(PyObject(rA))
ERROR: PyError ...
mkitti commented 2 years ago

I intend on closing this pull request on January 11th.

mkitti commented 2 years ago

Closing this. See https://github.com/mkitti/NumPyArrays.jl if interested.