JuliaPy / PyCall.jl

Package to call Python functions from the Julia language
MIT License
1.46k stars 187 forks source link

Non-copy getting Arrow array? #872

Closed Moelf closed 3 years ago

Moelf commented 3 years ago

The package awkward provides support for rugged array:

arr = [[1,2,3], [], [4,5]]

Obviously Numpy doesn't support this, but Arrow does, is there a way to interface with such object so we don't have to copy?

If not general, is there a way for me to implement specifically for this python package?

stevengj commented 3 years ago

Yes, just do pycall(ak.to_arrayset, PyObject, ...), and so on. By passing PyObject to the pycall function, you tell it not to convert/copy to a Julia object.

Moelf commented 3 years ago

sorry I wasn't clear about my question, this works, but I can't use this python arrow array like a Julia array:

julia> @time ak.to_arrow(arr)
  0.001698 seconds (6 allocations: 288 bytes)
PyObject <pyarrow.lib.StructArray object at 0x7ff0b726ca68>
-- is_valid: all not null
-- child 0 type: list<item: double>
  [
    [
      0.304889,
      0.353568,
      1.23984,
      2.58227,
.....

julia> PyArray(ak.to_arrow(arr))
ERROR: PyError ($(Expr(:escape, :(ccall(#= /home/net3/jiling/.julia/packages/PyCall/BcTLp/src/pybuffer.jl:124 =# @pysym(:PyObject_GetBuffer), Cint, (PyPtr, Ref{PyBuffer}, Cint), o, b, flags))))) <class 'TypeError'>
TypeError("a bytes-like object is required, not 'pyarrow.lib.StructArray'",)
Moelf commented 3 years ago

@stevengj I believed technically I should be able to do

julia> pycall(ak.to_arrow, Arrow.List, arr)
ERROR: MethodError: Cannot `convert` an object of type
  PyObject to an object of type
  Arrow.List
Closest candidates are:
  convert(::Type{T}, ::T) 

since Arrow List should have a consistent memory layout across the board. But I'm not sure if I can make this happen trivially. Maybe it's also a question for @quinnj (if it's possible at all)

stevengj commented 3 years ago

PyObject is all you need here. Arrow.List objects are PyObjects.

Moelf commented 3 years ago

sorry, the Arrow.List is from Arrow.jl

What I want is a rugged vector that can be iterated/mapped by Julia functions without significant slowdown due to calling into libpython

stevengj commented 3 years ago

Then Arrow.jl should define a convert method.