data-dot-all / dataall

A modern data marketplace that makes collaboration among diverse users (like business, analysts and engineers) easier, increasing efficiency and agility in data projects on AWS.
https://data-dot-all.github.io/dataall/
Apache License 2.0
221 stars 78 forks source link

Add more visibility on resource-link databases imported as datasets #1098

Open dlpzx opened 4 months ago

dlpzx commented 4 months ago

This is an enhancement request. #1021 allows users to import a Glue database that is originally not in the same AWS account as the S3 Bucket. This scenario is very similar to the one described in this blogpost. There are data producer accounts where data is stored in S3 and then there is a central catalog account where all glue databases are created. The glue databases are then shared back with the data producer accounts as resource link databases using Lake Formation. More schematically:

In AWS:

In data.all:

Data sharing detects the source catalog and shares the Original Glue database. If pre-requisites are met: Environment A is onboarded in data.all and the Original Glue database is tagged as explained in #1021

Issues:

Solutions

(we can implement more than one or other alternatives)

SofiaSazonova commented 4 months ago

As a relatively new user of Data.all I would love to see some more instructions directly in UI. May be they shouldn't be shown by default, but it would be nice to have (?)-icon, which can be linked to particular paragraph in user guide.

As per LF-locations, I thinks it's some kind of a bug (feature?): we should register location afterwards. I think, we need to put effort into research of this behaviour.