GreptimeTeam / greptimedb

An open-source, cloud-native, unified time series database for metrics, logs and events with SQL/PromQL supported. Available on GreptimeCloud.
https://greptime.com/
Apache License 2.0
4.23k stars 303 forks source link

Open table format compatibility #4192

Open sunng87 opened 3 months ago

sunng87 commented 3 months ago

What problem does the new feature solve?

We are looking into some ideas to integrate GreptimeDB with data lakes like snowflake, databricks and others.

What does the feature do?

There are basicly 4 layers we can integrate with those solutions

Layer Solution Pros Cons
Hard copy Data movement Once data is properly moved into data lake or data warehouse, we gain full compatibility Data movement introduces latency and additional operation
API Integration Use JDBC or Arrow Flight to integrate GreptimeDB as some sort of external table Should be easier to integrate May suffer performance downgrade for complex query
Query Plan Integration Exposing query plan APIs and stats APIs for Trino May provide better performance Requires additional APIs
Table Format Integration Add compatibility layer to Iceberg or DeltaLake Should provide best compatibility for other query engines, not limit to databricks We have our own table formats and those open formats are not designed for high rate ingestion. A compatibility layer needs some design

Note that we don't need to fully switch to those open table formats, but to provide a compatibility layer.

Implementation challenges

No response