data-dot-all / dataall

A modern data marketplace that makes collaboration among diverse users (like business, analysts and engineers) easier, increasing efficiency and agility in data projects on AWS.
https://data-dot-all.github.io/dataall/
Apache License 2.0
221 stars 78 forks source link

Support WRITE access for consumer roles #1332

Open zsaltys opened 1 month ago

zsaltys commented 1 month ago

Is your idea related to a problem? Please describe. Currently data.all only grants READ only access to consumer IAM roles. However organizations like ours need to manage WRITE access as well to define which roles can write to which S3 buckets or which databases. Otherwise we end up managing read only access with data.all and all write access outside of data.all. We would like to unify and manage both read and write access via data.all.

Describe the solution you'd like We need to think this through in context of:

S3 bucket sharing Overall a very simple change. The IAM role just needs to be granted PUTObject permissions + KMS Encrypt permissions on the key etc. I would not want to grant DELETE or anything else besides PUT? Perhaps this can be configurable on config what permissions to grant so organizations can decide. Those extra permissions be granted by the user themselves on the IAM role using IaC when needed.

Access points I don't know too much about these but I suspect they work similarly to S3 bucket sharing. Hope the team can clarify on this ticket.

LakeFormation The question is what permissions we should grant here. We could limit only to just basic to add new partitions (don't know which permission controls that atm) which is what most of the writing roles will ever need. Or we can also grant things like DROP, INSERT, DELETE, CREATE. I think to be most useful to everyone WRITE should grant full write access with all the other permissions that are not currently granted. Though this could be inconsistent with S3 permissions if for example we only grant PUT on S3 but grant DELETE on the DB? Perhaps this could also be configurable but the default should be consistent for all.

RedShift Don't know enough on this to comment.

My proposal would be that WRITE access is defined when creating a share. We don't want to do it when registering a consumer role because consumer role could also be used for both write access on same account and read access on other accounts (ex EMR role). We can either define WRITE access per share item or for the entire share. I think defining it for the entire SHARE makes sense. Default should be READ ONLY. Write access should only be selectable (and validated by backend) if the consumer role and dataset belong to the same environment. The only problem I see is currently share items let you select specific tables so we could grant write access to them. But how do we grant CREATE access on the DB? Do we just do it implicitly which can confuse the user.. What if he doesn't select any tables should we still grant WRITE access to the DB? The only way I can think of solving this is that there must be a new share item for the DATABASE itself.

We also must make sure the share validator / health checker is made aware of the new extra permissions.

anmolsgandhi commented 3 weeks ago

Hi - Thanks for opening the issue, it will be part of v2.7 release. cc: @dlpzx