apache / paimon

Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.
https://paimon.apache.org/
Apache License 2.0
2.16k stars 855 forks source link

[Bug] Use the tag incremental query, file does not exist #1939

Open winfys opened 10 months ago

winfys commented 10 months ago

Search before asking

Paimon version

0.5

Compute Engine

flink1.16.1 spark3.3.1

Minimal reproduce step

schema options "options" : { "owner" : "root", "partition.expiration-check-interval" : "1 d", "tag.automatic-creation" : "process-time", "tag.creation-period" : "daily", "partition.expiration-time" : "30 d", "bucket" : "50", "file.compression" : "ZSTD", "snapshot.time-retained" : "24 H", "bucket-key" : "#log_uuid", "partition.timestamp-formatter" : "yyyy-MM-dd", "file.format" : "parquet", "tag.num-retained-max" : "10", "metadata.stats-mode" : "none", "tag.creation-delay" : "20 m" },

spark sql: SELECT count(1) cnt FROM paimon_incremental_query('bdc_ods.ods_log_paimon_inc_1d', '2023-09-02', '2023-09-03');

What doesn't meet your expectations?

Caused by: java.io.FileNotFoundException: File 'oss://bucket_dev/user/hive/warehouse/bdc_ods.db/ods_log_paimon_inc_1d/dt=2023-09-02/bucket-7/data-7d5e1fe1-55a5-4f23-9d0f-57e6d9d63063-2.parquet' not found, Possible causes: 1.snapshot expires too fast, you can configure 'snapshot.time-retained' option with a larger value. 2.consumption is too slow, you can improve the performance of consumption (For example, increasing parallelism). at org.apache.paimon.utils.FileUtils.createFormatReader(FileUtils.java:119) at org.apache.paimon.io.KeyValueDataFileRecordReader.<init>(KeyValueDataFileRecordReader.java:55) at org.apache.paimon.io.KeyValueFileReaderFactory.createRecordReader(KeyValueFileReaderFactory.java:95) at org.apache.paimon.mergetree.MergeTreeReaders.lambda$readerForRun$2(MergeTreeReaders.java:88) at org.apache.paimon.mergetree.compact.ConcatRecordReader.create(ConcatRecordReader.java:50) at org.apache.paimon.mergetree.MergeTreeReaders.readerForRun(MergeTreeReaders.java:91) at org.apache.paimon.mergetree.MergeTreeReaders.lambda$readerForSection$1(MergeTreeReaders.java:77) at org.apache.paimon.mergetree.MergeSorter.mergeSort(MergeSorter.java:119) at org.apache.paimon.mergetree.MergeTreeReaders.readerForSection(MergeTreeReaders.java:79) at org.apache.paimon.operation.KeyValueFileStoreRead.lambda$batchMergeRead$4(KeyValueFileStoreRead.java:235) at org.apache.paimon.mergetree.compact.ConcatRecordReader.create(ConcatRecordReader.java:50) at org.apache.paimon.operation.KeyValueFileStoreRead.batchMergeRead(KeyValueFileStoreRead.java:245) at org.apache.paimon.operation.KeyValueFileStoreRead.createReaderWithoutOuterProjection(KeyValueFileStoreRead.java:208) at org.apache.paimon.operation.KeyValueFileStoreRead.createReader(KeyValueFileStoreRead.java:182) at org.apache.paimon.table.source.KeyValueTableRead.createReader(KeyValueTableRead.java:51) at org.apache.paimon.spark.SparkReaderFactory.createReader(SparkReaderFactory.java:54)

Anything else?

No response

Are you willing to submit a PR?

JingsongLi commented 2 months ago

Can you query these two tags?