alibaba / fluss

Fluss is a streaming storage built for real-time analytics.
https://alibaba.github.io/fluss-docs
Apache License 2.0
695 stars 150 forks source link

[Feature] Support batch read table without datalake enabled #40

Open luoyuxia opened 5 days ago

luoyuxia commented 5 days ago

Search before asking

Motivation

Currently, if a table is not with datalake enabled, Fluss won't support batch read the table. Also, although the table is datalake enabeld, it the lake snapshot doesn't exist, the batch read will also fail which will force users to wait the snapshot available.

We should resolve these two cases for better user experience

Solution

1: In the case that the table is not with datalake enabeld, if it's primary key table, we should mrege ssts and change logs to read the full data, if it's a log table, we can just read the logs upto the latest offset. 2: In the case that the snapshot is not available for datalake enabeld, we can just consider it as a table without datalake enabeld and then batch read

Anything else?

No response

Willingness to contribute

loserwang1024 commented 2 days ago

I'd like to do it.