data-dot-all / dataall

A modern data marketplace that makes collaboration among diverse users (like business, analysts and engineers) easier, increasing efficiency and agility in data projects on AWS.
https://data-dot-all.github.io/dataall/
Apache License 2.0
236 stars 82 forks source link

Granular sharing using Lake Formation #348

Closed chamiles closed 3 months ago

chamiles commented 1 year ago

Customer has enabled granular (Row, Column, Cell) sharing using lake formation sharing and would like to see that capability in data.all sharing request, and in approval area. So that data owners and share the same datasets with restricted access to columns with out having to create duplicate another dataset and data.

Customer current solution is to use data filters and typing in a Manuel expression, this may be the best open a text area to add expression, but best user experience would show column names in a visual checkbox way with ability to put in expressions for rows and cells.

https://docs.aws.amazon.com/lake-formation/latest/dg/data-filters-about.html

dlpzx commented 1 year ago

Hi @chamiles, for column level sharing we opened an issue a while ago. But there was no much interests (#84 ), instead we opted to implement Tag-based access control using Lake Formation, which also allows column level granularity. It is a feature in-progress at the moment developed in this issue #186

For row/cell level filtering we can work together on implementing this sort of data filters in shared tables.

anmolsgandhi commented 5 months ago

Bumping this up, This issue will be used to track the enhancement and extension of the current data.all sharing functionality to include column-level and row-level access control in Lake Formation. This has been one of the top requests from customers based on our conversations. This will be included as a feature enhancement for completion in v2.7.

zsaltys commented 4 months ago

@anmolsgandhi I think it is important we get the UI right to make it clear that this can only for table sharing and if you grant S3 access then you're effectively breaking this security measure because user will have access to the entire bucket and dataset behind the table. Let's discuss this.

noah-paige commented 4 months ago

Hi @zsaltys - I am starting to work on design for this and can start sharing some mock ups of the UI here as I continue to progress this week

I agree some type of warning or call out from FE would be helpful to ensure user understands what they are doing when sharing data objects - but I think we can add warnings more so from the bucket sharing side because sharing an entire bucket is akin to sharing ALL folders and tables in data.all

I will post additional design description on this issue today

noah-paige commented 4 months ago

Design


Assumptions


User Experience


Creating a Data Filter

Applying Data Filter to a Share

Additional Considerations

Mock Ups of FE Design


List Data Filters View [IMAGE]

Screenshot 2024-07-17 at 12 20 44 PM

Delete Data Filter [VIDEO]

https://github.com/user-attachments/assets/3dae0bd2-916f-4f26-bce1-67af5ad4d1da

Create a New Data Filter [VIDEO]

https://github.com/user-attachments/assets/b6088731-527e-4c8e-a403-faf2d5dd3273

Attaching Data Filter(s) to Share [VIDEO]

https://github.com/user-attachments/assets/9414c99f-43c7-49b2-8dba-266d62a197e1

noah-paige commented 4 months ago

Progress Tracker


Testing (MOVED TO PR COMMENTS)

noah-paige commented 4 months ago

Table Sharing Finding

Current Approach

Currently when we share a table cross account we follow the following steps:

        0) Check if source account details are properly initialized and initialize the Glue and LF clients
        1) Grant ALL permissions to pivotRole for source database in source account
        2) Create the shared database in target account if it doesn't exist
        3) Grant permissions to pivotRole and principals to "shared" database
        4) For each shared table:
            a) Update its status to SHARE_IN_PROGRESS with Action Start
            b) Check if table exists on glue catalog raise error if not and flag share item status to failed
            c) If it is a cross-account share:
                c.1) Revoke iamallowedgroups permissions from table
                c.2) Grant target account permissions to original table -> create RAM invitation
                c.3) Accept pending RAM invitation
            d) Create resource link for table in target account
            e) If it is a cross-account share: grant permission to principals to RAM-shared table in target account
            f) grant permission to principals to resource link table
            g) update share item status to SHARE_SUCCESSFUL with Action Success

Most importantly - we re use the shared DB and the resource link table in the target account and then add additional grants for new principals who get approved access to shared data


Moving to Data Filters

When it comes to data filters - herein lies an issue because:


Looking for a Solutions

noah-paige commented 4 months ago

If following alogn with Option 3 above - adding additional details here

Pre-Reqs

Cross-account grants made using the named resource method are compatible across different versions. Even if the grantor account is using an older version (version 1 or 2) and the recipient account is using a newer version (version 3 or higher), the cross-account access functionality operates seamlessly without any compatibility issues or errors.

To share resources directly with IAM principals in another account, only the grantor needs to use version 3.

https://docs.aws.amazon.com/lake-formation/latest/dg/optimize-ram.html

Comparison

In the proposed option - we do the same DB steps as before which is

But instead of

We do

Originally if TableX was shared to same cross account to GroupA and GroupB we would have

Now if TableX was shared to same cross account to GroupA (w/ Filter1) and GroupB (w/ Filter2)

Errors

Could not grant principal QS_GROUP_ARN permissions ['DESCRIBE', 'SELECT'] and permissions with grant options None to {'TableWithColumns': {'DatabaseName': 'DB_NAME', 'Name': 'TABLE_NAME', 'ColumnWildcard': {}, 'CatalogId': 'SOURCE_ACCOUNT'}} due to: An error occurred (InvalidInputException) when calling the GrantPermissions operation: Cross account requests are only allowed for AWS Accounts, Organizations, IAM Principals and All IAMPrincipals

dlpzx commented 4 months ago

Hi @noah-paige, I love the UI views! Here are some remarks on the design and the table findings:

noah-paige commented 4 months ago

Findings from testing: