apache / paimon

Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.
https://paimon.apache.org/
Apache License 2.0
2.13k stars 842 forks source link

[spark] exclude parquet-column and parquet-hadoop dependency #3669

Closed askwang closed 3 days ago

askwang commented 4 days ago

Purpose

For test failure in IDEA. parquet-column conflict.

java.lang.IllegalAccessError: tried to access method org.apache.parquet.io.ColumnIO.getRepetitionLevel()I from class org.apache.paimon.format.parquet.reader.ParquetSplitReaderUtil
    at org.apache.paimon.format.parquet.reader.ParquetSplitReaderUtil.constructField(ParquetSplitReaderUtil.java:382)
    at org.apache.paimon.format.parquet.reader.ParquetSplitReaderUtil.buildFieldsList(ParquetSplitReaderUtil.java:374)
    at org.apache.paimon.format.parquet.ParquetReaderFactory.createReader(ParquetReaderFactory.java:118)

and parquet-hadoop conflict.

java.lang.NoSuchFieldError: LZ4_RAW
    at org.apache.parquet.hadoop.metadata.CompressionCodecName.<clinit>(CompressionCodecName.java:34)
    at org.apache.parquet.hadoop.ParquetWriter.<clinit>(ParquetWriter.java:45)
    at org.apache.parquet.hadoop.ParquetWriter$Builder.<init>(ParquetWriter.java:357)

Tests

API and Format

Documentation