data-dot-all / dataall

A modern data marketplace that makes collaboration among diverse users (like business, analysts and engineers) easier, increasing efficiency and agility in data projects on AWS.
https://data-dot-all.github.io/dataall/
Apache License 2.0
228 stars 82 forks source link

Support for table sharing when a catalog account is being used #904

Closed blitzmohit closed 6 months ago

blitzmohit commented 9 months ago

Is your feature request related to a problem? Please describe.

In certain data mesh architectures such as the ones described in https://aws.amazon.com/blogs/big-data/design-a-data-mesh-architecture-using-aws-lake-formation-and-aws-glue/ and https://aws.amazon.com/blogs/big-data/how-jpmorgan-chase-built-a-data-mesh-architecture-to-drive-significant-value-to-enhance-their-enterprise-data-platform/ a catalog account owns the Glue Database & Tables instead of the producer.

Currently data.all does not account for or support sharing of tables using a catalog account.

If a dataset is imported using a database which was shared to them from a catalog account i.e. a resource link, the import works fine. However if any attempt to share access to any of the tables in such a dataset outside the same producer account is made data.all would fail because LakeFormation does not allow resharing of Databases/tables

Describe the solution you'd like Proposed solution is as follows:

  1. On share approval, detect if the source Glue database is a resource link
  2. If it is a resource link then identify the catalog account
  3. Check that data.all has access to this account i.e. it should be on boarded as a data.all environment
  4. Validate permissions i.e. does the dataset owner approving the request have access to share this table/database. To support this we are checking for tag “owner-account-id” on the database which should be the same as the dataset owner’s account id.

Additional context In terms of support, the catalog could be an additional high level object in data.all that could power additional use cases

dlpzx commented 8 months ago

Discussion happening directly in PR #905

noah-paige commented 6 months ago

Closing this issue - as completed in #1021