Open dennyglee opened 2 years ago
Note, we will be adding/updating the issue over the next few weeks but I'm a little behind schedule so thought I would get the roadmap discussion started ASAP. Thanks!
Hi folks. Can't see it explicitly mentioned so thought I'd ask - will identity columns support (i.e. writer version 6) be added in this H2 wave? That's a big feature we're keen to be able to use outside of Databricks, and it didn't quite make it into 2.0 by the looks of things.
Hi folks. Can't see it explicitly mentioned so thought I'd ask - will identity columns support (i.e. writer version 6) be added in this H2 wave? That's a big feature we're keen to be able to use outside of Databricks, and it didn't quite make it into 2.0 by the looks of things.
Thanks for the call out @edfreeman - identity columns
has been added :)
Hi @dennyglee, what about Auto compaction and Optimize Write? I don't think the PRs are getting some attention for review / merge. Could you add it to the roadmap?
Oh good point! Let me get back to you on this shortly! Sorry about that!
On Wed, Aug 17, 2022 at 18:41 EJ Song @.***> wrote:
Hi @dennyglee https://github.com/dennyglee, what about Auto compaction and Optimize Write? I don't think the PRs are getting some attention for review / merge. Could you add it to the roadmap?
— Reply to this email directly, view it on GitHub https://github.com/delta-io/delta/issues/1307#issuecomment-1218895025, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALBHLLDOCHFYXAGKWGKXEDVZWIC3ANCNFSM55JXKP7A . You are receiving this because you were mentioned.Message ID: @.***>
Hi @dennyglee, what about support for displaying DDL of delta tables (SHOW CREATE TABLE
)
https://github.com/delta-io/delta/issues/1032
https://github.com/delta-io/delta/pull/1255
Hi @dennyglee, what about support for displaying DDL of delta tables (
SHOW CREATE TABLE
) #1032 #1255
Good call out @keen85 - let me check with @zpappa on this!
Hi @dennyglee, what about support for displaying DDL of delta tables (
SHOW CREATE TABLE
) #1032 #1255Good call out @keen85 - let me check with @zpappa on this!
I have some minor style and test updates for this PR to be considered done, I can finish them today and we can try to pull them Into 2.1
How about delta caching that is present on databricks? Is there a plan for such feature?
"Delta caching" is actually a Databricks Runtime engine feature, not part of the format. Caching data on an processing engine's executor/workers nodes is something that can really be done well by the engine itself, not by a data format. It's unfortunate and confusing that we had marketed it under the "Delta" brand name, even though it's really not part of the "Delta Lake" storage format. So, in short, its not really possible to open source that as part of Delta Lake.
I Have being experimenting with Delta lake in Google Cloud and DuckDB and it is very promising, but without a local SSD cache it will never be fast enough, maybe we need a cache for Delta independently from Databricks implementation, Delta knows which files needs to be scanned, keeping a local copy on the first call will be really useful, at least for a the standalone reader
Hi @dennyglee any update?
Sorry about that @sezruby - yes, we will be adding these to the roadmap very shortly. Thanks for your patience (I’ve been out the last two weeks)
Some quick updates:
HTH!
Hi @dennyglee ! I know it was on the 2022 H1 github page, but I haven't seen any mention on clone
functionality being moved into the open source library. Is there any update around that? I poked around the current source code but didn't really see it anywhere.
Thanks @p2bauer - great call out. I've added this to the roadmap and created issue #1387 to track this. HTH!
Hi @dennyglee , Is there any update on supporting higher protocol versions in Presto and Trino?
Great question @SanthoshPrasad - we've been working with the PrestoDB and the Trino communities on this and we should have some updates on various progress around this over the next couple of months. One of the methods we're doing this is through our DAT effort (Delta Acceptance Testing) so we can more cleanly document and clarify which APIs are on which protocol version. If you're interested in learning more on this, please join us in the #dat
channel in Delta Users Slack. HTH!
Suggest we add Airbyte Destination S3: add delta lake/delta table support to the roadmap as it's already part of the Delta Rust Roadmap - WDYT?
Support jdbc catalog https://github.com/delta-io/delta/issues/1459
I'd like to suggest adding "Register VACUUM in delta log" to the roadmap
I know that each commit, min/max values are calculated for each parquet file and are present in the delta log json, but how about adding more granularity to existing data skipping mechanism, by using parquet page skipping? Relevant links:
Would this be doable?
I know that each commit, min/max values are calculated for each parquet file and are present in the delta log json, but how about adding more granularity to existing data skipping mechanism, by using parquet page skipping? Relevant links:
- https://issues.apache.org/jira/browse/PARQUET-922
- https://blog.cloudera.com/speeding-up-select-queries-with-parquet-page-indexes/
Would this be doable?
@dudzicp Oh, could you please create a separate issue for this and we can discuss the specifics there? Thanks!
How about bucketing?
Hey, I am interested in more details regarding https://delta.io/sharing/ It's stated that presto and trino are coming soon, but I could not really find any details or timelines. Please notice, that I am asking regaring delta sharing in context of Uniyt Catalog in particular and not necessarily regarding delta & trino/presto integration
Ahh, for Delta Sharing features within the context of UC, please ping Databricks community. Thanks!
Hey @dennyglee any updates on the Roadmap? :) I was creating some issues in the mack project (e.g. Python support for table property update) but wanted to make sure that the delta-spark team is not working on the things that I came up with already.
Thanks for your patience @robertkossendey - we're working on this but admittedly way behind schedule due to all of the various asks, eh?! Saying this, please continue working on mack project activities as those are the ones we're pretty sure make more sense for mack to address or at least if we plan to merge this into delta-spark
, it'll be further out on the roadmap. Thanks for the ping, eh?!
Good to know, thank you for the update @dennyglee!
@dennyglee @allisonport-db do you have any updates on auto compact and optimize write?
This is a working issue for folks to provide feedback on the prioritization of the Delta Lake priorities spanning July to December 2022. With the release of Delta Lake 2.0, we wanted to take the opportunity to discuss other vital features for prioritization with the community based on the feedback from the Delta Users Slack, Google Groups, Community AMAs (on Delta Lake YouTube), the Roadmap 2022H2 (discussion), and more.
Priority 0
We will focus on these issues and continue to deliver parts (or all of the issue) over the next six months
VERSION AS OF
andTIMESTAMP AS OF
in SELECT statements.Priority 1
We should be able to deliver parts (or all of the issue) over the next six months
Priority 2
History