JuliaStrings / InlineStrings.jl

Fixed-width string types for Julia
Other
45 stars 13 forks source link

interoperability with round-tripping through data format broken on latest release (1.4.1) #76

Closed adienes closed 4 months ago

adienes commented 4 months ago
# julia

julia> Arrow.write("mwe.arrow", Tables.rowtable((; a = String3["xyz", "123"])))
# python

>>> import polars as pl
>>> pl.read_ipc("mwe.arrow")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/homebrew/lib/python3.11/site-packages/polars/utils/deprecation.py", line 133, in wrapper
    return function(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/polars/utils/deprecation.py", line 133, in wrapper
    return function(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/polars/io/ipc/functions.py", line 103, in read_ipc
    return pl.DataFrame._read_ipc(
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/polars/dataframe/frame.py", line 990, in _read_ipc
    self._df = PyDataFrame.read_ipc(
               ^^^^^^^^^^^^^^^^^^^^^
polars.exceptions.ComputeError: cannot create series from Extension("JuliaLang.InlineStrings.InlineString3", Utf8, Some(""))
ericphanson commented 4 months ago

I created a simple PyArrow.jl wrapper to try to help setup inter-language tests for packages like this. But here I'm getting:

using InlineStrings, Arrow, PyArrow, Tables
const ft = pyimport("pyarrow.feather")

Arrow.write("mwe.arrow", Tables.rowtable((; a = String3["xyz", "123"])))
ft.read_table("mwe.arrow")

gives me:

Python:
pyarrow.Table
a: string not null
----
a: [["xyz","123"]]

which seems fine.

So I wonder if it's a polars issue that they are not ignoring unknown extensions?

ericphanson commented 4 months ago

hm, looks like it: https://github.com/pola-rs/polars/issues/9112#issuecomment-1568995370

adienes commented 4 months ago

ah I see. thanks for looking into it --- I'll bump the issue over there