amundsen-io / amundsen

Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting with data.
https://www.amundsen.io/amundsen/
Apache License 2.0
4.42k stars 958 forks source link

Support RBAC model in Amundsen #854

Closed shahneil88 closed 3 years ago

shahneil88 commented 3 years ago

Expected Behavior or Use Case

Having RBAC model can serve 2 purpose:-

1) Currently, anyone who has access to Amundsen can access any metadata and can edit metadata descriptors. There has to be an ideal way where we can restrict users based on their role on which DBs/tables can they edit. 2) We can restrict on what level of data/metadata can someone see. Eg - If there are details about any particular DB that only admin have access to, then it can be restricted.

Service or Ingestion ETL

Frontend and metadataservice

markgrover commented 3 years ago

Thanks @shahneil88 I have heard #1 a lot more than #2. Is that in line with how you think of their relative importance?

shahneil88 commented 3 years ago

@markgrover Yes

youcandanch commented 3 years ago

@markgrover I've actually got a compelling internal use case for #2, where we want to restrict access based on job role -- eg. our sales staff only sees catalouged data for services they have access to (think SFDC, Marketo, etc.) while engineers have broader access that includes our more transactional stores (Snowflake, MySQL, etc.). It'd be fairly easy to handle just by spinning up two versions of Amundsen, but the redundancy isn't ideal. Feels like an extension to get_user_details would be a feasible way to tackle this -- if work hasn't started, I'd be happy to submit an RFC and eventually a PR.

mgorsk1 commented 3 years ago

As for #2 we also encountered scenario in which team (project) has tables where by naming convention sensitive information might be contained inside column names. The requirement (or nice-to-have feature) was to obfuscate such information from people outside of this team.

Another use case for #2 would be - we have a lot of official (managed) tables with curated metadata we wouldn't want to be edited/deleted by random user. Imagine someone changes description of official table that many people rely on.

Still there are different ways to approach that. It can as well be achieved on backend level. For instance - when using Atlas as backend you can manage security through oidc and apache ranger combination (Keycloak oidc gives authn to Atlas and Apache Ranger gives authz allowing to display/return only relvant info from Atlas).

For sure such scenarios still require Amundsen to understand the concept of different status codes so the communication to the end user is more clear (for now we just have Something went wrong... message).

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

stale[bot] commented 3 years ago

This issue has been automatically closed for inactivity. If you still wish to make these changes, please open a new pull request or reopen this one.