apache / paimon

Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.
https://paimon.apache.org/
Apache License 2.0
2.43k stars 954 forks source link

[Bug] #4568

Open ljingz opened 14 hours ago

ljingz commented 14 hours ago

Search before asking

Paimon version

0.9

Compute Engine

Spark

Minimal reproduce step

CREATE TABLE tmp.test1234 ( id INT, order_id STRING, game_code STRING, is_delete TINYINT ) USING paimon TBLPROPERTIES ( 'snapshot.time-retained'='4 h', 'snapshot.num-retained.min'='1', 'metastore.partitioned-table'='true', 'dynamic-bucket.initial-buckets'='1', 'dynamic-bucket.target-row-num'='6000000', 'file.format'='parquet' );

insert into tmp.test1234 values (1,'xxx','yyy',1);

select * from tmp.test1234 where is_delete=1;

What doesn't meet your expectations?

Caused by: java.lang.ClassCastException: java.lang.Byte cannot be cast to java.lang.Integer at org.apache.paimon.shade.org.apache.parquet.schema.PrimitiveComparator$IntComparator.compareNotNulls(PrimitiveComparator.java:85) at org.apache.paimon.shade.org.apache.parquet.schema.PrimitiveComparator.compare(PrimitiveComparator.java:63) at org.apache.paimon.shade.org.apache.parquet.column.statistics.Statistics.compareMinToValue(Statistics.java:388) at org.apache.paimon.shade.org.apache.parquet.filter2.statisticslevel.StatisticsFilter.visit(StatisticsFilter.java:148) at org.apache.paimon.shade.org.apache.parquet.filter2.statisticslevel.StatisticsFilter.visit(StatisticsFilter.java:67) at org.apache.paimon.shade.org.apache.parquet.filter2.predicate.Operators$Eq.accept(Operators.java:178) at org.apache.paimon.shade.org.apache.parquet.filter2.statisticslevel.StatisticsFilter.visit(StatisticsFilter.java:410) at org.apache.paimon.shade.org.apache.parquet.filter2.statisticslevel.StatisticsFilter.visit(StatisticsFilter.java:67) at org.apache.paimon.shade.org.apache.parquet.filter2.predicate.Operators$And.accept(Operators.java:379) at org.apache.paimon.shade.org.apache.parquet.filter2.statisticslevel.StatisticsFilter.canDrop(StatisticsFilter.java:75) at org.apache.paimon.shade.org.apache.parquet.filter2.compat.RowGroupFilter.visit(RowGroupFilter.java:103) at org.apache.paimon.shade.org.apache.parquet.filter2.compat.RowGroupFilter.visit(RowGroupFilter.java:45) at org.apache.paimon.shade.org.apache.parquet.filter2.compat.FilterCompat$FilterPredicateCompat.accept(FilterCompat.java:149) at org.apache.paimon.shade.org.apache.parquet.filter2.compat.RowGroupFilter.filterRowGroups(RowGroupFilter.java:72) at org.apache.paimon.shade.org.apache.parquet.hadoop.ParquetFileReader.filterRowGroups(ParquetFileReader.java:351) at org.apache.paimon.shade.org.apache.parquet.hadoop.ParquetFileReader.(ParquetFileReader.java:250) at org.apache.paimon.format.parquet.ParquetReaderFactory.createReader(ParquetReaderFactory.java:106) at org.apache.paimon.format.parquet.ParquetReaderFactory.createReader(ParquetReaderFactory.java:72) at org.apache.paimon.io.FileRecordReader.(FileRecordReader.java:82) at org.apache.paimon.operation.RawFileSplitRead.createFileReader(RawFileSplitRead.java:263) at org.apache.paimon.operation.RawFileSplitRead.lambda$createReader$1(RawFileSplitRead.java:169) at org.apache.paimon.mergetree.compact.ConcatRecordReader.create(ConcatRecordReader.java:53) at org.apache.paimon.operation.RawFileSplitRead.createReader(RawFileSplitRead.java:177) at org.apache.paimon.operation.RawFileSplitRead.createReader(RawFileSplitRead.java:144) at org.apache.paimon.table.AppendOnlyFileStoreTable$1.reader(AppendOnlyFileStoreTable.java:128) at org.apache.paimon.table.source.AbstractDataTableRead.createReader(AbstractDataTableRead.java:82) at org.apache.paimon.spark.PaimonPartitionReaderFactory.$anonfun$createReader$1(PaimonPartitionReaderFactory.scala:55) at org.apache.paimon.spark.PaimonPartitionReader.readSplit(PaimonPartitionReader.scala:90) at org.apache.paimon.spark.PaimonPartitionReader.(PaimonPartitionReader.scala:42) at org.apache.paimon.spark.PaimonPartitionReaderFactory.createReader(PaimonPartitionReaderFactory.scala:56) at org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.advanceToNextIter(DataSourceRDD.scala:84) at org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.hasNext(DataSourceRDD.scala:63) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:43) at org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:388) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:893) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:893) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:367) at org.apache.spark.rdd.RDD.iterator(RDD.scala:331) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93) at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:166) at org.apache.spark.scheduler.Task.run(Task.scala:141) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620) at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64) at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:623) ... 3 more

Anything else?

No response

Are you willing to submit a PR?