delta-io / delta

An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
https://delta.io
Apache License 2.0
7.62k stars 1.71k forks source link

[BUG] `columnMapping` readerFeatures is missing when icebergCompatV1(2) is enabled #3154

Open ebyhr opened 6 months ago

ebyhr commented 6 months ago

Bug

Which Delta project/connector is this regarding?

Describe the problem

Writer Requirements for Column Mapping mentions:

write a protocol action to add the feature columnMapping to both readerFeatures and writerFeatures

However, protocol entry doesn't have columnMapping in readerFeatures field.

Steps to reproduce

  1. Create a new table with icebergCompatV1 feature:

    CREATE TABLE default.test_iceberg
    (a INT)
    USING DELTA
    LOCATION 's3://test-bucket/test_iceberg'
    TBLPROPERTIES ('delta.enableIcebergCompatV1'=true);
  2. Confirm transaction log:

    
    {"commitInfo":{"timestamp":1716541145876,"operation":"CREATE TABLE","operationParameters":{"isManaged":"false","description":null,"partitionBy":"[]","properties":"{\"delta.enableIcebergCompatV1\":\"true\",\"delta.columnMapping.mode\":\"name\",\"delta.columnMapping.maxColumnId\":\"1\"}"},"isolationLevel":"Serializable","isBlindAppend":true,"operationMetrics":{},"engineInfo":"Apache-Spark/3.5.0 Delta-Lake/3.1.0","txnId":"b172aa56-9393-4750-ac30-2c1996497efb"}}
    {"metaData":{"id":"8d56ae44-1648-4fd5-a920-81f5f0725ab5","format":{"provider":"parquet","options":{}},"schemaString":"{\"type\":\"struct\",\"fields\":[{\"name\":\"a\",\"type\":\"integer\",\"nullable\":true,\"metadata\":{\"delta.columnMapping.id\":1,\"delta.columnMapping.physicalName\":\"col-39727da2-bc31-4d96-a0a9-b3f16113fde2\"}}]}","partitionColumns":[],"configuration":{"delta.enableIcebergCompatV1":"true","delta.columnMapping.mode":"name","delta.columnMapping.maxColumnId":"1"},"createdTime":1716541145862}}
    {"protocol":{"minReaderVersion":2,"minWriterVersion":7,"writerFeatures":["columnMapping","icebergCompatV1"]}}


#### Observed results

<!-- What happened?  This could be a description, log output, etc. -->

#### Expected results

icebergCompatV1 or icebergCompatV2 exists in readerFeatures. 

#### Further details

<!--
Include any additional details that may be useful for diagnosing the problem here. If including tracebacks, please include the full traceback. Large logs and files should be attached.
-->

### Environment information

* Delta Lake version: 3.1.0
* Spark version: 3.5.0
* Scala version: 2.12

### Willingness to contribute

The Delta Lake Community encourages bug fix contributions. Would you or another member of your organization be willing to contribute a fix for this bug to the Delta Lake code base?

- [ ] Yes. I can contribute a fix for this bug independently.
- [ ] Yes. I would be willing to contribute a fix for this bug with guidance from the Delta Lake community.
- [x] No. I cannot contribute a bug fix at this time.
vkorukanti commented 6 months ago

cc. @lzlfred @harperjiang

ebyhr commented 5 months ago

@vkorukanti @lzlfred @harperjiang Could you let me know whether this is incorrect documentation or implementation bug?

harperjiang commented 5 months ago

@ebyhr the document said that

If the table is on Writer Version 5 or 6: write a metaData action to add the delta.columnMapping.mode table property;
If the table is on Writer Version 7:
write a protocol action to add the feature columnMapping to both readerFeatures and writerFeatures

I believe the behavior you observed is just as described in the doc when we have writer version 5/6, and thus don't consider it a bug.

ebyhr commented 5 months ago

The writer version is 7 and it mentions "add the feature columnMapping to both readerFeatures and writerFeatures".

{"protocol":{"minReaderVersion":2,"minWriterVersion":7,"writerFeatures":["columnMapping","icebergCompatV1"]}}

Then, I would recommend updating the protocol. I expect readerFeatures always have columnMapping when it exists in writerFeatures from the current sentence.

felipepessoto commented 5 months ago

The writer version is 7 and it mentions "add the feature columnMapping to both readerFeatures and writerFeatures".

{"protocol":{"minReaderVersion":2,"minWriterVersion":7,"writerFeatures":["columnMapping","icebergCompatV1"]}}

Then, I would recommend updating the protocol. I expect readerFeatures always have columnMapping when it exists in writerFeatures from the current sentence.

I think we need to update the protocol to make that paragraph clearer.

When reader version is 2 the readerFeatures doesn't exist, and if it does, it would break the protocol:

For new tables, when a new table is created with a Reader Version up to 2 and Writer Version 7, its protocol action must only contain writerFeatures.

felipepessoto commented 5 months ago

Actually, after running some experiment I realized Spark Delta doesn't allow you to enable column mapping if reader is version 2 and writer is version 7.

But it allows you to create it, so it is inconsistent, I can't say what is the correct behavior, my guess is supporting reader 2 + writer 7 (with columnMapping writerFeatures) should be the considered compliant with spec.

ALTER TABLE Test SET TBLPROPERTIES (
    'delta.minReaderVersion' = '2',
    'delta.minWriterVersion' = '7'
  );

This fails:

ALTER TABLE Test SET TBLPROPERTIES (
    'delta.columnMapping.mode' = 'name'
  )
Your current table protocol version does not support changing column mapping modes
using delta.columnMapping.mode.

Required Delta protocol version for column mapping:
Protocol(3,7,[columnMapping],[columnMapping])
Your table's current Delta protocol version:
Protocol(2,7,None,[appendOnly,invariants])

Please enable Column Mapping on your Delta table with mapping mode 'name'.
You can use one of the following commands.

If your table is already on the required protocol version:
ALTER TABLE table_name SET TBLPROPERTIES ('delta.columnMapping.mode' = 'name')

If your table is not on the required protocol version and requires a protocol upgrade:
ALTER TABLE table_name SET TBLPROPERTIES (
   'delta.columnMapping.mode' = 'name',
   'delta.minReaderVersion' = '3',
   'delta.minWriterVersion' = '7')

This works:

CREATE TABLE Test (Id INT) USING DELTA TBLPROPERTIES ('delta.minWriterVersion' = '7', 'delta.columnMapping.mode' = 'name');