delta-io / delta-rs

A native Rust library for Delta Lake, with bindings into Python
https://delta-io.github.io/delta-rs/
Apache License 2.0
2.2k stars 394 forks source link

Invalid data type for Delta Lake: Dictionary(Int32, Utf8) #1546

Closed yefetBenTili closed 10 months ago

yefetBenTili commented 1 year ago

Environment

Delta-rs version: 0.10.0

Binding:

Environment:


Exception: Schema error: :

I am trying to write to an already existing deltalake destination in s3 using delta

import pyarrow.parquet as pq
from deltalake.writer import write_deltalake

df = pq.read_table('data')

storage_options = {
    “AWS_DEFAULT_REGION”: “eu-central-1",
    “AWS_ACCESS_KEY_ID”: os.environ[“AWS_ACCESS_KEY_ID”],
    “AWS_SECRET_ACCESS_KEY”: os.environ[“AWS_SECRET_ACCESS_KEY”],
    “AWS_S3_ALLOW_UNSAFE_RENAME”: “true”,
}

destination =  "s3://some_s3_location"
write_deltalake(destination,
    df,
    mode=“append”,
    storage_options=storage_options,
    partition_by=[“titles”, "train_title", "date"]
)

The data gets written to s3 but somehow I get this exception Invalid data type for Delta Lake: Dictionary(Int32, Utf8)

Any clue why this is happening? I played around it little bit and seem there is something wrong with data type of the partition keys folders

cmackenzie1 commented 1 year ago

I believe this is the same issue as #1445, which will be fixed in #1481

ion-elgreco commented 11 months ago

@yefetBenTili are you still having this bug?