delta-io / delta

An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
https://delta.io
Apache License 2.0
7.65k stars 1.72k forks source link

[Feature Request] Enable new fast-list-from feature for S3 multi-cluster writer #1471

Open scottsand-db opened 2 years ago

scottsand-db commented 2 years ago

Feature request

Overview

https://github.com/delta-io/delta/pull/1210 allows S3SingleDriverLogStore to use the startAfter param for faster list logStore calls. This feature only works on S3A file systems.

The feature: We should update the S3DynamoDBLogStore to use this feature, too.

Motivation

We want to bring the same list performance improvements made in #1210 for single-driver writes to our multi-cluster writer.

scottsand-db commented 1 year ago

Here's a prototype PR if someone wants to pick up this work: https://github.com/delta-io/delta/pull/1612