datafuselabs / databend

๐——๐—ฎ๐˜๐—ฎ, ๐—”๐—ป๐—ฎ๐—น๐˜†๐˜๐—ถ๐—ฐ๐˜€ & ๐—”๐—œ. Modern alternative to Snowflake. Cost-effective and simple for massive-scale analytics. https://databend.com
https://docs.databend.com
Other
7.43k stars 715 forks source link

Tracking issues of Data Lake with Iceberg Support #12272

Open Xuanwo opened 11 months ago

Xuanwo commented 11 months ago

After the close of https://github.com/datafuselabs/databend/issues/11947, Databend has completed all preparation work required for implementing data lake support!

Databend now has multi-catalog support!

We can create a new catalog like:

CREATE CATALOG iceberg_ctl
TYPE=ICEBERG
CONNECTION=(
    URL='s3://testbucket/iceberg_ctl/'
    AWS_KEY_ID='minioadmin'
    AWS_SECRET_KEY='minioadmin'
    ENDPOINT_URL='${STORAGE_S3_ENDPOINT_URL}'
);

And we can show/drop them:

SHOW DATABASES IN iceberg_ctl;
SHOW TABLES IN iceberg_ctl.iceberg_db;
DROP CATALOG IF EXISTS iceberg_ctl

Databend now can read existing iceberg!

We can query data in an exisint iceberg table like the following:

SELECT count(*) FROM iceberg_ctl.iceberg_db.iceberg_tbl;

We have found a way to add data features in Databend. I have some ideas that we can start working on:

Tasks

Our current goal is to make reading from iceberg table fast and reliable.

Future

chrisfw commented 7 months ago

Hi @Xuanwo , this is an exciting feature! I was wondering though, if the initial implementation supports iceberg's temporal/as-of queries?

Regards, Chris Whelan

atifiu commented 2 months ago

Currently, databend support querying Iceberg tables with partition on timestamp column with day/month/year transformation or does task "Implement partiation for iceberg table" means the same ?