apache / paimon

Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.
https://paimon.apache.org/
Apache License 2.0
2.1k stars 834 forks source link

[Bug] streaming read by `from-timestamp` may be occur exception(snapshot file not fund) sometimes #3479

Open Mr-j-yangyu opened 3 weeks ago

Mr-j-yangyu commented 3 weeks ago

Search before asking

Paimon version

0.8

Compute Engine

Flink

Minimal reproduce step

Analyze source code in SnapshotManager.class

when read earliest snapshot file at step2, the snapshot maybe expired and deleted

20240606165856

What doesn't meet your expectations?

query job will failed by throw FileNotFoundException

Anything else?

No response

Are you willing to submit a PR?

Aitozi commented 2 weeks ago

@Mr-j-yangyu Have you try to point to a safer timestamp to read (keep a margin from the oldest snapshot/changelog)?

Mr-j-yangyu commented 2 weeks ago

@Mr-j-yangyu Have you try to point to a safer timestamp to read (keep a margin from the oldest snapshot/changelog)?

@Aitozi It is necessary to read earliest snapshot or changelog in some usage scenarios. Can add a logic to verify file exist when read earliest snapshot or changelog ?,just read next if not exist to reduce the probability of exception.