delta-io / delta-kernel-rs

A native Delta implementation for integration with any query engine
Apache License 2.0
147 stars 42 forks source link

`MetadataValue` schema doesn't support nested values, used by IcebergCompatV2 protocol #253

Closed jeppe742 closed 5 months ago

jeppe742 commented 5 months ago

When you create a delta table with Uniform enabled it will create a delta transaction that looks something like this

{"commitInfo":{"timestamp":1717753754287,"operation":"CREATE TABLE","operationParameters":{"isManaged":"true","description":null,"partitionBy":"[]","properties":"{\"delta.enableIcebergCompatV2\":\"true\",\"delta.universalFormat.enabledFormats\":\"iceberg\",\"delta.columnMapping.mode\":\"name\",\"delta.columnMapping.maxColumnId\":\"1\"}"},"isolationLevel":"Serializable","isBlindAppend":true,"operationMetrics":{},"engineInfo":"Apache-Spark/3.5.1 Delta-Lake/3.1.0","txnId":"a4d4593f-835c-4d00-81d8-27c1103343d2"}}
{"metaData":{"id":"a8477f73-f004-4a08-8397-3420d4df98a2","format":{"provider":"parquet","options":{}},"schemaString":"{\"type\":\"struct\",\"fields\":[{\"name\":\"c1\",\"type\":\"integer\",\"nullable\":true,\"metadata\":{\"delta.columnMapping.id\":1,\"delta.columnMapping.nested.ids\":{},\"delta.columnMapping.physicalName\":\"col-fdc375c2-e5f2-44c5-a5e9-2cdafca1ddfd\"}}]}","partitionColumns":[],"configuration":{"delta.enableIcebergCompatV2":"true","delta.universalFormat.enabledFormats":"iceberg","delta.columnMapping.mode":"name","delta.columnMapping.maxColumnId":"1"},"createdTime":1717753754108}}
{"protocol":{"minReaderVersion":2,"minWriterVersion":7,"writerFeatures":["columnMapping","icebergCompatV2"]}}

Notice that the metaData.schemaString.metadata property has the following metadata

{
    "metadata": {
        "delta.columnMapping.id": 1,
        "delta.columnMapping.nested.ids": {},
        "delta.columnMapping.physicalName": "col-fdc375c2-e5f2-44c5-a5e9-2cdafca1ddfd"
    }
}

Currently the schema parser only expects a number, string or boolean, but not a nested struct like we have for "delta.columnMapping.nested.ids": {} https://github.com/delta-incubator/delta-kernel-rs/blob/823367e4dc13b627914412ee2ca7933a1c7b822a/kernel/src/schema.rs#L20-L24

This causes all delta tables written with Iceberg enabled through Uniform, to be unreadable with the kernel. (See https://github.com/delta-io/delta-rs/issues/2578)

nicklan commented 5 months ago

Thanks for the report. #257 should fix this!

jeppe742 commented 5 months ago

Thanks @nicklan ! Just out of curiosity, do we have an idea when this will be included in a new release? 😃

nicklan commented 5 months ago

I need to verify that we haven't changed any APIs, but assuming we haven't, I'll get a 0.1.2 release out this week with this and a few other fixes.

nicklan commented 4 months ago

@jeppe742 sorry for the long delay! we did change APIs so I needed to do a 0.2.0 release, but it's now out with this included.