apache / gravitino

World's most powerful open data catalog for building a high-performance, geo-distributed and federated metadata lake.
https://gravitino.apache.org
Apache License 2.0
958 stars 302 forks source link

[FEATURE] Add a JDBC catalog of DuckDB #1358

Open qqqttt123 opened 9 months ago

qqqttt123 commented 9 months ago

Describe the feature

DuckDB is a popular and lite database for OLAP.

Motivation

DuckDB is a simple and effective database. More users use it to analyze data.

Describe the solution

We can refer to other database catalog implements.

Additional context

No response

henriquepaes1 commented 8 months ago

Hello! I feel like I could take on this issue. If this issue is assigned to me, is there any initial bibliography I should read regarding DuckDB?

qqqttt123 commented 8 months ago

Hello! I feel like I could take on this issue. If this issue is assigned to me, is there any initial bibliography I should read regarding DuckDB?

You can refer to Duck SQL grammar, JDBC document https://duckdb.org/docs/api/java.html and other JDBC implements in Gravitino.

henriquepaes1 commented 8 months ago

I've studied the other JDBC catalogs and I'm understand the first steps in other to build a new catalog. However, I'm new to Gradle usage and need help with the following ponts:

justinmclean commented 8 months ago

For the gradle configuration, it is as simple as:

xunliu commented 8 months ago

Duckdb is an embedded database, I was concerned DuckDB doesn't require a metadata management service, So I created a discussion in DuckDB community. https://github.com/duckdb/duckdb/discussions/10177

austin362667 commented 8 months ago

I was concerned DuckDB doesn't require a metadata management service

I'm wondering why an embedded database like DuckDB doesn't need a metadata management service.

shaofengshi commented 7 months ago

Duckdb is an embedded database, I was concerned DuckDB doesn't require a metadata management service, So I created a discussion in DuckDB community. duckdb/duckdb#10177

I have the same feeling with Xun; In most of time, embedded database is more like a library, it is not going to be discovered or consumed by external applications. So I'm curious about the detailed scenario.

zhoukangcn commented 3 months ago

Perhaps we could utilize DuckDB as an engine instead of a catalog. We can use DuckDB to query data managed by Gravitino.

shaofengshi commented 3 months ago

Perhaps we could utilize DuckDB as an engine instead of a catalog. We can use DuckDB to query data managed by Gravitino.

Agree; But this issue's description is not that way; Maybe we should create another issue, which is add DuckDB as an engine for Gravitino.

shaofengshi commented 3 months ago

I was concerned DuckDB doesn't require a metadata management service

I'm wondering why an embedded database like DuckDB doesn't need a metadata management service.

Hi Austin, I think Xun's comment is ambiguous, he was responding to this issue's description, which is proposing to add DuckDB as a JDBC catalog into Gravitino (similar to MySQL, Postgres, etc).

If changing a perspective, DuckDB can connect to Gravitino to get the metadata information as it more like an engine (similar to Spark, Trino, etc), that will make more sense. What do you think? @austin362667