delta-io / delta-rs

A native Rust library for Delta Lake, with bindings into Python
https://delta-io.github.io/delta-rs/
Apache License 2.0
2.32k stars 408 forks source link

pyarrow `DictionaryArray` as partition column for `write_deltalake` fails #2969

Open jorritsandbrink opened 2 weeks ago

jorritsandbrink commented 2 weeks ago

Environment

Delta-rs version: 0.21.0

Binding: python

Environment: local, WSL2, Ubuntu 24.04.1 LTS


Bug

What happened: _internal.DeltaError: Generic DeltaTable error: Missing partition column: failed to parse when using pyarrow DictionaryArray as partition column for write_deltalake.

What you expected to happen: Successful write.

How to reproduce it:

import pyarrow as pa
from deltalake import write_deltalake

# pyarrow.lib.DictionaryArray
array = pa.array(["a", "b", "c"], type=pa.dictionary(pa.int8(), pa.string()))

data = {
    "foo": [1, 2, 3],
    "bar": [1, 2, 3],
    "baz": array,
    # "baz": ["a", "b", "c"],  # using this instead works
}
table = pa.table(data)

# write to partitioned delta table
write_deltalake("my_delta_table", table, partition_by="baz")

# _internal.DeltaError: Generic DeltaTable error: Missing partition column: failed to parse

More details:

Traceback (most recent call last):
  File "/home/j/repos/dlt/mre.py", line 16, in <module>
    write_deltalake("my_delta_table", table, partition_by="baz")
  File "/home/j/.cache/pypoetry/virtualenvs/dlt-2tG_aB2A-py3.9/lib/python3.9/site-packages/deltalake/writer.py", line 323, in write_deltalake
    write_deltalake_rust(
_internal.DeltaError: Generic DeltaTable error: Missing partition column: failed to parse
leanadah commented 9 hours ago

Hi there, has this been resolved?