JuliaPy / PythonCall.jl

Python and Julia in harmony.
https://juliapy.github.io/PythonCall.jl/stable/
MIT License
722 stars 61 forks source link

Cannot create Pandas Dataframe from Julia Dataframe #322

Closed schlichtanders closed 1 year ago

schlichtanders commented 1 year ago

Affects: JuliaCall

Describe the bug

I get the error JuliaError: MethodError: no method matching iterate(::Symbol) which also happens when just iterating over the pairs iterator from python. Here the example from Python:

from juliacall import Main as jl
import pandas as pd

jl.eval("""
df = DataFrame(grp=repeat(1:2, 3), x=6:-1:1, y=4:9, z=[3:7; missing], id='a':'f')
""")
pd.DataFrame(jl.pairs(jl.eachcol(jl.df)))

raises

---------------------------------------------------------------------------
JuliaError                                Traceback (most recent call last)
Cell In[78], line 1
----> 1 pd.DataFrame(jl.pairs(jl.eachcol(jl.df2)))

File [~/Projects/Jolin.io/workshop-accelerate-Python-with-Julia/.venv/lib/python3.10/site-packages/pandas/core/frame.py:781](https://file+.vscode-resource.vscode-cdn.net/home/ssahm/Projects/fall-in-love-with-julia/~/Projects/Jolin.io/workshop-accelerate-Python-with-Julia/.venv/lib/python3.10/site-packages/pandas/core/frame.py:781), in DataFrame.__init__(self, data, index, columns, dtype, copy)
    779     if columns is not None:
    780         columns = ensure_index(columns)
--> 781     arrays, columns, index = nested_data_to_arrays(
    782         # error: Argument 3 to "nested_data_to_arrays" has incompatible
    783         # type "Optional[Collection[Any]]"; expected "Optional[Index]"
    784         data,
    785         columns,
    786         index,  # type: ignore[arg-type]
    787         dtype,
    788     )
    789     mgr = arrays_to_mgr(
    790         arrays,
    791         columns,
   (...)
    794         typ=manager,
    795     )
    796 else:

File [~/Projects/Jolin.io/workshop-accelerate-Python-with-Julia/.venv/lib/python3.10/site-packages/pandas/core/internals/construction.py:498](https://file+.vscode-resource.vscode-cdn.net/home/ssahm/Projects/fall-in-love-with-julia/~/Projects/Jolin.io/workshop-accelerate-Python-with-Julia/.venv/lib/python3.10/site-packages/pandas/core/internals/construction.py:498), in nested_data_to_arrays(data, columns, index, dtype)
    495 if is_named_tuple(data[0]) and columns is None:
    496     columns = ensure_index(data[0]._fields)
--> 498 arrays, columns = to_arrays(data, columns, dtype=dtype)
    499 columns = ensure_index(columns)
    501 if index is None:

File [~/Projects/Jolin.io/workshop-accelerate-Python-with-Julia/.venv/lib/python3.10/site-packages/pandas/core/internals/construction.py:837](https://file+.vscode-resource.vscode-cdn.net/home/ssahm/Projects/fall-in-love-with-julia/~/Projects/Jolin.io/workshop-accelerate-Python-with-Julia/.venv/lib/python3.10/site-packages/pandas/core/internals/construction.py:837), in to_arrays(data, columns, dtype)
    834     arr, columns = _list_of_series_to_arrays(data, columns)
    835 else:
    836     # last ditch effort
--> 837     data = [tuple(x) for x in data]
    838     arr = _list_to_arrays(data)
    840 content, columns = _finalize_columns_and_data(arr, columns, dtype)

File [~/Projects/Jolin.io/workshop-accelerate-Python-with-Julia/.venv/lib/python3.10/site-packages/pandas/core/internals/construction.py:837](https://file+.vscode-resource.vscode-cdn.net/home/ssahm/Projects/fall-in-love-with-julia/~/Projects/Jolin.io/workshop-accelerate-Python-with-Julia/.venv/lib/python3.10/site-packages/pandas/core/internals/construction.py:837), in (.0)
    834     arr, columns = _list_of_series_to_arrays(data, columns)
    835 else:
    836     # last ditch effort
--> 837     data = [tuple(x) for x in data]
    838     arr = _list_to_arrays(data)
    840 content, columns = _finalize_columns_and_data(arr, columns, dtype)

File [~/.julia/packages/PythonCall/dsECZ/src/jlwrap/iter.jl:37](https://file+.vscode-resource.vscode-cdn.net/home/ssahm/Projects/fall-in-love-with-julia/~/.julia/packages/PythonCall/dsECZ/src/jlwrap/iter.jl:37), in __next__(self)
     35         return self
     36     def __next__(self):
---> 37         return self._jl_callmethod($(pyjl_methodnum(pyjliter_next)))
     38 """, @__FILE__(), "exec"), jl.__dict__)
     39 pycopy!(pyjlitertype, jl.IteratorValue)

JuliaError: MethodError: no method matching iterate(::Symbol)

Closest candidates are:
  iterate(!Matched::Union{LinRange, StepRangeLen})
   @ Base range.jl:880
  iterate(!Matched::Union{LinRange, StepRangeLen}, !Matched::Integer)
   @ Base range.jl:880
  iterate(!Matched::Union{LinearAlgebra.Eigen, LinearAlgebra.GeneralizedEigen})
   @ LinearAlgebra [/nix/store/i6jayqiqfw6h8inkhqigkarv2gjar02a-julia-bin-1.9.0/share/julia/stdlib/v1.9/LinearAlgebra/src/eigen.jl:122](https://file+.vscode-resource.vscode-cdn.net/nix/store/i6jayqiqfw6h8inkhqigkarv2gjar02a-julia-bin-1.9.0/share/julia/stdlib/v1.9/LinearAlgebra/src/eigen.jl:122)
  ...

Stacktrace:
 [1] pyjliter_next(self::PythonCall.Iterator)
   @ PythonCall [~/.julia/packages/PythonCall/dsECZ/src/jlwrap/iter.jl:14](https://file+.vscode-resource.vscode-cdn.net/home/ssahm/Projects/fall-in-love-with-julia/~/.julia/packages/PythonCall/dsECZ/src/jlwrap/iter.jl:14)
 [2] _pyjl_callmethod(f::Any, self_::Ptr{PythonCall.C.PyObject}, args_::Ptr{PythonCall.C.PyObject}, nargs::Int64)
   @ PythonCall [~/.julia/packages/PythonCall/dsECZ/src/jlwrap/base.jl:57](https://file+.vscode-resource.vscode-cdn.net/home/ssahm/Projects/fall-in-love-with-julia/~/.julia/packages/PythonCall/dsECZ/src/jlwrap/base.jl:57)
 [3] _pyjl_callmethod(o::Ptr{PythonCall.C.PyObject}, args::Ptr{PythonCall.C.PyObject})
   @ PythonCall.C [~/.julia/packages/PythonCall/dsECZ/src/cpython/jlwrap.jl:47](https://file+.vscode-resource.vscode-cdn.net/home/ssahm/Projects/fall-in-love-with-julia/~/.julia/packages/PythonCall/dsECZ/src/cpython/jlwrap.jl:47)

Your system Please provide detailed information about your system:

schlichtanders commented 1 year ago

I found the following workaround by casting two times to dict

pd.DataFrame(dict(jl.Dict(jl.pairs(jl.eachcol(jl.df2)))))

at least it is still kind of a one liner :smile:

cjdoris commented 1 year ago

PythonCall has a function for this: jl.pytable.

schlichtanders commented 1 year ago

Impressive, I was thinking this works exactly the opposite way. I looked through many discourse issues and rescanned the docs, but I didn't come about this. Thank you for the link! I see now that I was looking into the wrong part of the documentation - I was just inspecting the Python side and the Julia side, but haven't looked for an extra compatibility page.

I think PyTable is listed on the PythonCall subpage, but pytable is yet not listed on the JuliaCall subpage. It would be great to be added there so that people can find it in the reference their by searching for pandas or DataFrame

schlichtanders commented 1 year ago

I am running into the very same problem if I want to send a Julia Dict to python

pandas works weird with JuliaDict wrapper, hence I would really like to understand what is the recommended way to transform a julia dict to a python dict. (couldn't find it so far)

cjdoris commented 1 year ago

pydict 🙂

schlichtanders commented 1 year ago

that makes a lot of sense - thank you

let me summarize how I understood PythonCall/JuliaCall:

  1. no implicit conversions happen
  2. however standard types are wrapped into equivalent Python/Julia wrappers which work for many cases like normal Python/Julia objects
  3. sometimes this fails, in which case you need to use explicit conversions (e.g. using pytable or pydict)
    • This type of failure is not easily preventable, but rather systematic. It is because, as in these cases above, the failure is due to the types being different. If no implicit conversion happens but only wrappers are used, it makes sense that the type can make problems, as the wrapper will necessarily have a different type than the default Python/Julia object.
    • I guess it can help people if there could be a highlighted warning about this expected set of failures and the standards on how to deal with them
cjdoris commented 1 year ago

That sounds about right. I'm always happy to take PRs for improvements to the docs if you think something can be clearer.