delta-io / delta

An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
https://delta.io
Apache License 2.0
7.24k stars 1.63k forks source link

Apache Druid - Delta Connector #1170

Open Thelin90 opened 2 years ago

Thelin90 commented 2 years ago

Feature request

Hello.

presto has recently created a delta.io adapter to link delta files directly, this is making me consider to use a presto cluster instead (would scale decently enough with current data load I have to work with). However, is there anyone who has faced this issue and solved it in a “nice” way, or are there any plans to add a delta connector similar to what presto has done?

Overview

Here is a reference to what presto has added: https://prestodb.io/blog/2022/03/15/native-delta-lake-connector-for-presto

Motivation

To utilise apache druid for centralising a lakehouse architecture with delta.io as the base data layer.

Further details

And a video: https://www.youtube.com/watch?v=JrXGkqpl7xk (fast forward to 21:40) this is what would be nice to have but within apache druid.

Willingness to contribute

The Delta Lake Community encourages new feature contributions. Would you or another member of your organization be willing to contribute an implementation of this feature?

dennyglee commented 2 years ago

Oh, this is great to hear @Thelin90 - would it help if we found some time to chat on the various design considerations? Are any other folks interested in helping out? Thanks!

jaylynstoesz commented 1 year ago

Bumping this request. Support for ingesting from Delta Lake and/or querying external Delta tables in Druid would be extremely useful. Right now my workaround is to manually parse out file paths from the Delta manifest and submit those in an ingestion spec, which isn't ideal.

I'm happy to contribute to this as well.

abhishekagarwal87 commented 5 months ago

For anyone interested in this connector, @abhishekrb19 has raised a PR here - https://github.com/apache/druid/pull/15755