delta-io / delta-rs

A native Rust library for Delta Lake, with bindings into Python
https://delta-io.github.io/delta-rs/
Apache License 2.0
2.34k stars 413 forks source link

Unable to read delta table created using Uniform #2578

Closed jeppe742 closed 4 months ago

jeppe742 commented 5 months ago

Environment

Delta-rs version: 0.17.4

Binding: Python

Environment:


Bug

What happened: We are investigating using Delta Uniform to have our Spark jobs also write Iceberg metadata. In order to enable the generation of Iceberg metadata you have to set the delta.enableIcebergCompatV2 property on the table. When you set this, the Delta transaction log will include some more information.

E.g if you run the Example from the Uniform documentation

CREATE TABLE uniform_table(c1 INT) USING DELTA TBLPROPERTIES(
  'delta.enableIcebergCompatV2' = 'true',
  'delta.universalFormat.enabledFormats' = 'iceberg');

You will get a delta transaction that looks something like this

{"commitInfo":{"timestamp":1717753754287,"operation":"CREATE TABLE","operationParameters":{"isManaged":"true","description":null,"partitionBy":"[]","properties":"{\"delta.enableIcebergCompatV2\":\"true\",\"delta.universalFormat.enabledFormats\":\"iceberg\",\"delta.columnMapping.mode\":\"name\",\"delta.columnMapping.maxColumnId\":\"1\"}"},"isolationLevel":"Serializable","isBlindAppend":true,"operationMetrics":{},"engineInfo":"Apache-Spark/3.5.1 Delta-Lake/3.1.0","txnId":"a4d4593f-835c-4d00-81d8-27c1103343d2"}}
{"metaData":{"id":"a8477f73-f004-4a08-8397-3420d4df98a2","format":{"provider":"parquet","options":{}},"schemaString":"{\"type\":\"struct\",\"fields\":[{\"name\":\"c1\",\"type\":\"integer\",\"nullable\":true,\"metadata\":{\"delta.columnMapping.id\":1,\"delta.columnMapping.nested.ids\":{},\"delta.columnMapping.physicalName\":\"col-fdc375c2-e5f2-44c5-a5e9-2cdafca1ddfd\"}}]}","partitionColumns":[],"configuration":{"delta.enableIcebergCompatV2":"true","delta.universalFormat.enabledFormats":"iceberg","delta.columnMapping.mode":"name","delta.columnMapping.maxColumnId":"1"},"createdTime":1717753754108}}
{"protocol":{"minReaderVersion":2,"minWriterVersion":7,"writerFeatures":["columnMapping","icebergCompatV2"]}}

If you try to read this table you get the following error

_internal.DeltaProtocolError: Invalid JSON in file stats: data did not match any variant of untagged enum MetadataValue at line 1 column 147

Seems like what is causing this is that Delta adds "delta.columnMapping.nested.ids":{} to the metaData config, but the delta kernel doesn't support nested structs in meta data

What you expected to happen: I should be able to read a delta table written with uniform enabled

How to reproduce it:

  1. create table with Uniform (and Iceberg) enabled
    CREATE TABLE uniform_table(c1 INT) USING DELTA TBLPROPERTIES(
    'delta.enableIcebergCompatV2' = 'true',
    'delta.universalFormat.enabledFormats' = 'iceberg');
  2. Try to read table
    from deltalake import DeltaTable
    DeltaTable("uniform_table")

    More details:

ion-elgreco commented 5 months ago

Can you try it against at 0.18.0, if it still persist, then it deserves an upstream issue at delta-kernel-rs repo

jeppe742 commented 5 months ago

@ion-elgreco It's also an issue with 0.18.0. Will try to create an issue in the delta-kernel-rs repo 😃

jeppe742 commented 4 months ago

Hey @ion-elgreco Just fyi, the bug in delta-kernel-rs has finally been fixed and released in 0.2.0. Would it be possible to bump the dependency to get the fix?

ion-elgreco commented 4 months ago

Hey @ion-elgreco Just fyi, the bug in delta-kernel-rs has finally been fixed and released in 0.2.0. Would it be possible to bump the dependency to get the fix?

Feel free to open a PR to bump it, then I'll approve