JuliaPy / PythonCall.jl

Python and Julia in harmony.
https://juliapy.github.io/PythonCall.jl/stable/
MIT License
717 stars 61 forks source link

TypeError: cannot pickle 'PyCapsule' object #454

Closed ma-sadeghi closed 4 months ago

ma-sadeghi commented 4 months ago

Affects: JuliaCall

Describe the bug I'm trying to call a Julia function (from Python) that returns a vector of objects generated via pmap on multiple workers. (Not sure if the pmap is even relevant, just in case)

Reproduce the bug The error gets thrown as soon as I call this Julia function:

https://github.com/MaximeVH/EquivalentCircuits.jl/blob/875522d8c7da5a774b20d497cb045177c97017cf/src/CircuitEvolution.jl#L358-L399

Error message

Python: TypeError: cannot pickle 'PyCapsule' object   core.py:432
Python stacktrace: none                                                                     
Stacktrace:                                                                                 
[1] pythrow()                                                                             
    @ PythonCall ~/.julia/packages/PythonCall/wXfah/src/err.jl:94                           
[2] errcheck                                                                              
    @ ~/.julia/packages/PythonCall/wXfah/src/err.jl:10 [inlined]                            
[3] pycallargs                                                                            
    @ ~/.julia/packages/PythonCall/wXfah/src/abstract/object.jl:210 [inlined]               
[4] pycall(f::Py, args::Py; kwargs::@Kwargs{})                                            
    @ PythonCall ~/.julia/packages/PythonCall/wXfah/src/abstract/object.jl:228              
[5] pycall                                                                                
    @ ~/.julia/packages/PythonCall/wXfah/src/abstract/object.jl:218 [inlined]               
[6] Py                                                                                    
    @ ~/.julia/packages/PythonCall/wXfah/src/Py.jl:341 [inlined]                            
[7] serialize_py(s::Distributed.ClusterSerializer{Sockets.TCPSocket}, x::Py)              
    @ PythonCall                                                                            
~/.julia/packages/PythonCall/wXfah/src/compat/serialization.jl:9                            
[8] serialize(s::Distributed.ClusterSerializer{Sockets.TCPSocket}, x::Py)                 
    @ PythonCall                                                                            
~/.julia/packages/PythonCall/wXfah/src/compat/serialization.jl:25                           
[9] serialize_any(s::Distributed.ClusterSerializer{Sockets.TCPSocket}, x::Any)            
    @ Serialization                                                                         
~/.julia/juliaup/julia-1.10.0+0.x64.linux.gnu/share/julia/stdlib/v1.10/Serializa            
tion/src/Serialization.jl:676                                                               
[10] serialize(s::Distributed.ClusterSerializer{Sockets.TCPSocket}, x::Any)                
    @ Serialization                                                                         
~/.julia/juliaup/julia-1.10.0+0.x64.linux.gnu/share/julia/stdlib/v1.10/Serializa            
tion/src/Serialization.jl:655                                                               
[11] serialize_any(s::Distributed.ClusterSerializer{Sockets.TCPSocket}, x::Any)            
    @ Serialization                                                                         
~/.julia/juliaup/julia-1.10.0+0.x64.linux.gnu/share/julia/stdlib/v1.10/Serializa            
tion/src/Serialization.jl:676                                                               
[12] serialize(s::Distributed.ClusterSerializer{Sockets.TCPSocket}, x::Any)                
    @ Serialization                                                                         
~/.julia/juliaup/julia-1.10.0+0.x64.linux.gnu/share/julia/stdlib/v1.10/Serializa            
tion/src/Serialization.jl:655                                                               
[13] serialize_msg(s::Distributed.ClusterSerializer{Sockets.TCPSocket},                    
o::Distributed.CallMsg{:call_fetch})                                                        
    @ Distributed                                                                           
~/.julia/juliaup/julia-1.10.0+0.x64.linux.gnu/share/julia/stdlib/v1.10/Distribut            
ed/src/messages.jl:78                                                                       
[14] #invokelatest#2                                                                       
    @ ./essentials.jl:887 [inlined]                                                         
[15] invokelatest                                                                          
    @ ./essentials.jl:884 [inlined]                                                         
[16] send_msg_(w::Distributed.Worker, header::Distributed.MsgHeader,                       
msg::Distributed.CallMsg{:call_fetch}, now::Bool)                                           
    @ Distributed                                                                           
~/.julia/juliaup/julia-1.10.0+0.x64.linux.gnu/share/julia/stdlib/v1.10/Distribut            
ed/src/messages.jl:181                                                                      
[17] send_msg                                                                              
    @                                                                                       
~/.julia/juliaup/julia-1.10.0+0.x64.linux.gnu/share/julia/stdlib/v1.10/Distribut            
ed/src/messages.jl:122 [inlined]                                                            
[18] remotecall_fetch(f::Function, w::Distributed.Worker, args::Int64;                     
kwargs::@Kwargs{})                                                                          
    @ Distributed                                                                           
~/.julia/juliaup/julia-1.10.0+0.x64.linux.gnu/share/julia/stdlib/v1.10/Distribut            
ed/src/remotecall.jl:460                                                                    
[19] remotecall_fetch(f::Function, w::Distributed.Worker, args::Int64)                     
    @ Distributed                                                                           
~/.julia/juliaup/julia-1.10.0+0.x64.linux.gnu/share/julia/stdlib/v1.10/Distribut            
ed/src/remotecall.jl:454                                                                    
[20] remotecall_fetch(f::Function, id::Int64, args::Int64)                                 
    @ Distributed                                                                           
~/.julia/juliaup/julia-1.10.0+0.x64.linux.gnu/share/julia/stdlib/v1.10/Distribut            
ed/src/remotecall.jl:492                                                                    
[21] remotecall_pool(rc_f::Function, f::Function, pool::Distributed.WorkerPool,            
args::Int64; kwargs::@Kwargs{})                                                             
    @ Distributed                                                                           
~/.julia/juliaup/julia-1.10.0+0.x64.linux.gnu/share/julia/stdlib/v1.10/Distribut            
ed/src/workerpool.jl:126                                                                    
[22] remotecall_pool                                                                       
    @ Distributed                                                                           
~/.julia/juliaup/julia-1.10.0+0.x64.linux.gnu/share/julia/stdlib/v1.10/Distribut            
ed/src/workerpool.jl:123 [inlined]                                                          
[23] remotecall_fetch                                                                      
    @ Distributed                                                                           
~/.julia/juliaup/julia-1.10.0+0.x64.linux.gnu/share/julia/stdlib/v1.10/Distribut            
ed/src/workerpool.jl:232 [inlined]                                                          
[24] #208                                                                                  
    @ Distributed                                                                           
~/.julia/juliaup/julia-1.10.0+0.x64.linux.gnu/share/julia/stdlib/v1.10/Distribut            
ed/src/workerpool.jl:288 [inlined]                                                          
[25]                                                                                       
(::Base.var"#1023#1028"{Distributed.var"#208#210"{Distributed.var"#208#209#211"{            
Distributed.WorkerPool, EquivalentCircuits.var"#177#178"{Int64, Int64, String,              
Int64, Float64, Nothing, Float64, Nothing, PyArray{ComplexF64, 1, true, true,               
ComplexF64}, PyArray{Float64, 1, true, true,                                                
Float64}}}}})(r::Base.RefValue{Any}, args::Tuple{Int64})                                    
    @ Base ./asyncmap.jl:94                                                                 
[26]                                                                                       
(::Base.var"#1039#1040"{Base.var"#1023#1028"{Distributed.var"#208#210"{Distribut            
ed.var"#208#209#211"{Distributed.WorkerPool,                                                
EquivalentCircuits.var"#177#178"{Int64, Int64, String, Int64, Float64, Nothing,             
Float64, Nothing, PyArray{ComplexF64, 1, true, true, ComplexF64},                           
PyArray{Float64, 1, true, true, Float64}}}}}, Channel{Any}, Nothing})()                     
    @ Base ./asyncmap.jl:228                                                                
Stacktrace:                                                                                 
[1] (::Base.var"#1033#1035")(x::Task)                                                     
    @ Base ./asyncmap.jl:171                                                                
[2] foreach(f::Base.var"#1033#1035", itr::Vector{Any})                                    
    @ Base ./abstractarray.jl:3094                                                          
[3] maptwice(wrapped_f::Function, chnl::Channel{Any},                                     
worker_tasks::Vector{Any}, c::UnitRange{Int64})                                             
    @ Base ./asyncmap.jl:171                                                                
[4] wrap_n_exec_twice                                                                     
    @ ./asyncmap.jl:147 [inlined]                                                           
[5] #async_usemap#1018                                                                    
    @ ./asyncmap.jl:97 [inlined]                                                            
[6] kwcall(::NamedTuple, ::typeof(Base.async_usemap), f::Any, c::Vararg{Any})             
    @ Base ./asyncmap.jl:78 [inlined]                                                       
[7] #asyncmap#1017                                                                        
    @ ./asyncmap.jl:75 [inlined]                                                            
[8] asyncmap                                                                              
    @ ./asyncmap.jl:74 [inlined]                                                            
[9] pmap(f::Function, p::Distributed.WorkerPool, c::UnitRange{Int64};                     
distributed::Bool, batch_size::Int64, on_error::Nothing,                                    
retry_delays::Vector{Any}, retry_check::Nothing)                                            
    @ Distributed                                                                           
~/.julia/juliaup/julia-1.10.0+0.x64.linux.gnu/share/julia/stdlib/v1.10/Distribut            
ed/src/pmap.jl:126                                                                          
[10] pmap                                                                                  
    @                                                                                       
~/.julia/juliaup/julia-1.10.0+0.x64.linux.gnu/share/julia/stdlib/v1.10/Distribut            
ed/src/pmap.jl:99 [inlined]                                                                 
[11] circuit_evolution_batch(measurements::PyArray{ComplexF64, 1, true, true,              
ComplexF64}, frequencies::PyArray{Float64, 1, true, true, Float64};                         
generations::Int64, population_size::Int64, terminals::String, head::Int64,                 
cutoff::Float64, initial_population::Nothing, convergence_threshold::Float64,               
bounds::Nothing, numprocs::Int64, iters::Int64, quiet::Bool)                                
    @ EquivalentCircuits                                                                    
~/Code/EquivalentCircuits.jl/src/CircuitEvolution.jl:381                                    
[12] pyjlany_call(self::typeof(circuit_evolution_batch), args_::Py,                        
kwargs_::Py)                                                                                
    @ PythonCall ~/.julia/packages/PythonCall/wXfah/src/jlwrap/any.jl:34                    
[13] _pyjl_callmethod(f::Any, self_::Ptr{PythonCall.C.PyObject},                           
args_::Ptr{PythonCall.C.PyObject}, nargs::Int64)                                            
    @ PythonCall ~/.julia/packages/PythonCall/wXfah/src/jlwrap/base.jl:69                   
[14] _pyjl_callmethod(o::Ptr{PythonCall.C.PyObject},                                       
args::Ptr{PythonCall.C.PyObject})                                                           
        @ PythonCall.C ~/.julia/packages/PythonCall/wXfah/src/cpython/jlwrap.jl:47      

Your system Please provide detailed information about your system:

❯ pip list
Package                       Version      Editable project location
----------------------------- ------------ -------------------------
alabaster                     0.7.13
altair                        5.2.0
arviz                         0.17.0
asttokens                     2.4.1
attrs                         23.2.0
autoeis                       0.0.19       /home/amin/Code/AutoEIS
Babel                         2.13.0
beautifulsoup4                4.12.2
click                         8.1.7
comm                          0.2.1
contourpy                     1.2.0
cycler                        0.12.1
debugpy                       1.8.0
decorator                     5.1.1
dill                          0.3.8
docutils                      0.20.1
exceptiongroup                1.2.0
executing                     2.0.1
fonttools                     4.47.2
h5netcdf                      1.3.0
h5py                          3.10.0
imagesize                     1.4.1
impedance                     1.7.1
iniconfig                     2.0.0
ipykernel                     6.29.2
ipython                       8.20.0
ipywidgets                    8.1.1
jax                           0.4.23
jaxlib                        0.4.23
jedi                          0.19.1
Jinja2                        3.1.2
joblib                        1.3.2
jsonschema                    4.20.0
jsonschema-specifications     2023.12.1
julia                         0.6.2
juliacall                     0.9.15
juliapkg                      0.1.10
jupyter_client                8.6.0
jupyter_core                  5.7.1
jupyterlab-widgets            3.0.9
kiwisolver                    1.4.5
markdown-it-py                3.0.0
MarkupSafe                    2.1.3
matplotlib                    3.8.2
matplotlib-inline             0.1.6
mdurl                         0.1.2
ml-dtypes                     0.3.2
mpire                         2.9.0
multipledispatch              1.0.0
multiprocess                  0.70.16
nest-asyncio                  1.6.0
numpy                         1.26.3
numpyro                       0.13.2
opt-einsum                    3.3.0
packaging                     23.2
pandas                        2.1.4
parso                         0.8.3
pexpect                       4.9.0
pillow                        10.2.0
pip                           23.3.2
platformdirs                  4.2.0
pluggy                        1.4.0
prompt-toolkit                3.0.43
psutil                        5.9.8
ptyprocess                    0.7.0
pure-eval                     0.2.2
Pygments                      2.16.1
pyparsing                     3.1.1
pytest                        8.0.0
python-dateutil               2.8.2
pytz                          2023.3.post1
pyzmq                         25.1.2
referencing                   0.32.1
rich                          13.7.0
rpds-py                       0.17.1
ruff                          0.2.1
scikit-learn                  1.4.0
scipy                         1.11.4
seaborn                       0.13.1
semantic-version              2.10.0
setuptools                    69.0.3
six                           1.16.0
snowballstemmer               2.2.0
soupsieve                     2.5
Sphinx                        7.2.6
sphinx-basic-ng               1.0.0b2
sphinxcontrib-applehelp       1.0.7
sphinxcontrib-devhelp         1.0.5
sphinxcontrib-htmlhelp        2.0.4
sphinxcontrib-jsmath          1.0.1
sphinxcontrib-qthelp          1.0.6
sphinxcontrib-serializinghtml 1.1.9
stack-data                    0.6.3
threadpoolctl                 3.2.0
tomli                         2.0.1
toolz                         0.12.0
tornado                       6.4
tqdm                          4.66.1
traitlets                     5.14.1
typing_extensions             4.9.0
tzdata                        2023.4
wcwidth                       0.2.13
wheel                         0.42.0
widgetsnbextension            4.0.9
xarray                        2023.12.0
xarray-einstats               0.6.0
>>> juliacall.Pkg.status()
Status `~/mambaforge/envs/autoeis/julia_env/Project.toml`
  [da5bd070] EquivalentCircuits v0.3.1 `~/Code/EquivalentCircuits.jl`
  [6099a3de] PythonCall v0.9.15
MilesCranmer commented 4 months ago

moved from https://github.com/MilesCranmer/PySR/pull/535:

@ma-sadeghi: @MilesCranmer Was there any trick to make Julia multiprocessing work with JuliaCall? I used to use PyJulia, and it worked fine calling a Julia function that's using Julia's multiprocessing, but I wanted to switch over to JuliaCall, but I can no longer call that function (JuliaPy/PythonCall.jl/issues/454).

I've tried a workaround, which is to use Python multiprocessing to call the serial version of that function in Julia, and that also turned out not to work either (JuliaPy/PythonCall.jl/issues/455).

I'd appreciate any pointers/tips. Thanks!

It doesn’t seem to be an issue for me due to the way SymbolicRegression.jl uses multiprocessing — since it is basically calling addprocs from within the Julia code. None of the stuff it is putting on Julia workers is actually accessible from Python.

mkitti commented 4 months ago

The issue is a serialization issue. The closest fix I can see is to manually figure out how to serialize the arguments.

ma-sadeghi commented 4 months ago

@mkitti Thanks. One question: serialization in the Python side or Julia? I'm a bit confused: I can successfully call the serial version of the same function from Python, so the output object type seems to be able to travel from Julia to Python. The parallel version returns a Vector of that object, is the output being a Vector causing the issue?

mkitti commented 4 months ago

I do not mean "serialization" in terms of "serial" vs "parallel", I mean saving all the arguments to disk and reloading them. That's what you're doing when you use pmap. That's why pickle is involved.

ma-sadeghi commented 4 months ago

Yeah, I understand. Forgive my ignorance on how JuliaCall (or other language interop tools) works internally. What I meant was that pmap is called inside the Julia function, so the serialization is done by Julia, why is it that when the Vector is returned, JuliaCall can't transfer it back? I'm clearly missing how the interop is done, I was just curious which part I'm not getting

mkitti commented 4 months ago

The error occurs with Python serialization, pickle. Attempt to serialize the arguments yourself with pickle rather than trying to pass them.

ma-sadeghi commented 4 months ago

@mkitti Thanks for the pointers, they helped me pin down the issue. The issue was the numpy arrays (measurements and frequencies) that were being passed as input. I just added:

measurements = Array(measurements)
frequencies = Array(frequencies)

to the Julia function and it seems to have made the serializer happy.

Just curious, is this expected behaviour for numpy arrays? Or is it a bug?

mkitti commented 4 months ago

Sounds kind of buggy, but I'm not sure. Now that you know this, try to create the simplest minimum working example possible. It might be useful to post this as a new issue.

ma-sadeghi commented 4 months ago

There you go: #459 cc: @mkitti

cjdoris commented 4 months ago

This is now fixed on main - the underlying issue being that PyArray was not serializable. Though unless you really do need a PyArray I'd recommend just converting it to Array.