delta-io / delta-sharing

An open protocol for secure data sharing
https://delta.io/sharing
Apache License 2.0
760 stars 171 forks source link

Update the list of shared tables dynamically #44

Open andrey-puzyr opened 3 years ago

andrey-puzyr commented 3 years ago

We have a delta-lake bucket with thousands of tables, total set are updated several times a day(it is a multitenant system, each tenant has own bunch of separate delta-tables). Separate service(our internal python ml tool) needs to read this data. Delta sharing server looks great, but we find it quite difficult to use static shares configuration for this case. Do you have any plan to support dynamic tables resolving?

I can think two possible ways to do this

  1. Delta server periodically call webhook that returns the actual set of tables(each with own s3a:// path and full logical name #share.schema.table)
  2. Define tables via yaml-config by wildcard, for example s3a://some-bucket/tenant_*/user_actions and update actual set periodically on delta-server side.
rtyler commented 3 years ago

@andrey-puzyr the Rust-based alternative Delta Sharing implementation [Riverbank(https://github.com/delta-incubator/riverbank) does have database-backed table/share configuration but there is no "public API" (past what the web forms use) for creating them since the reference implementation doesn't have an API to support authentication adding of tables/shares.

I'd be more than happy to discuss this in the #delta-sharing channel in the Delta project's slack workspace (join details on delta.io)