apache / airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
https://airflow.apache.org/
Apache License 2.0
37.34k stars 14.35k forks source link

RBAC access rights filtering by dag tags #9342

Open xuejunteo opened 4 years ago

xuejunteo commented 4 years ago

Description

Is there a way to auto generate permissions based on tags?

e.g. I want to automatically grant access to user based on his role to a specific list of tags

currently only the can dag read on and can dag write on is auto populated when there is a new dag

it would be good if i can do something like can dag read on tag <tag name> and can dag write on tag <tag name>

thanks in advance!

Use case / motivation

I have multiple teams using the same airflow server and access rights granting has been very tedious as it is currently done by adding individual dags manually to the access rights of each team

Related Issues

https://github.com/apache/airflow/pull/6489#issuecomment-643951458

boring-cyborg[bot] commented 4 years ago

Thanks for opening your first issue here! Be sure to follow the issue template!

sudarshan2906 commented 4 years ago

Any update on this? I want to do something similar to this. To give access to a group of dags. Is there any other way to do this. Since this is not taken by anyone?

kaxil commented 3 years ago

Temporarily added it to 2.1 Milestone to revisit this at a later time.

Assigning permissions to a group of DAGs is a good future -- I am not yet sure if "Tags" are the best way for it for the reason James already mentioned. Probably better would be some sort of "wildcard" on permissions -- not sure at the moment.

Also, we should just have a single way to add/edit permissions via UI & API -- not via DAGs themselves -- I know currently we allow it but it is "anti-pattern" in my opinion and is already causing some headaches.

alete89 commented 3 years ago

This is like a key feature. I can't believe we currently don't have a way to set permissions to a group of DAGs. Tags may not be the best, but I think it could work, like "You can only see DAGs tagged with data-science tag". Since this has not been added into 2.1, when are we expecting 3.0 to be launched? Can't we expect this to be merged sooner?

malthe commented 3 years ago

I agree that conflating tags (or other metadata that primarily supports the user interface) with security is a bad thing.

If the mechanism was pluggable somehow, then one strategy could be that the first directory level of the DAGs repository was mapped to a group or "container" – and RBAC roles could then in turn be assigned to those groups. It's a simple strategy that would also fit nicely in the situation where you have multiple remote repositories contributing to the DAGs repository as the first directory level.

The interface should of course be generic enough to support other strategies (including one based on tags).

potiuk commented 3 years ago

This is like a key feature. I can't believe we currently don't have a way to set permissions to a group of DAGs. Tags may not be the best, but I think it could work, like "You can only see DAGs tagged with data-science tag".

My personal opninon - and this is not a universal truth, but my persona view on this is, that this is not really whether we want to use tags or something else. I think main reason is because in essence multi-tenancy feature of Airflow does not have a chance to be working (yet!). By providing "access per-group" UI feature, we would give people a completely false impression, that you can isolate those DAG groups from each other. I personally believe it would be a very bad idea to tell people they can do it, while under the hood they would miss the isolation.

In short - you CAN't currently isolate groups of DAGs in single instance of Airflow. You simply CAN't. it's impossible by design, and we are aiming to iterate and improve the design in the future to make it possible, but it will take a while (and likely will only be fully available in Airflow 3). There are multiple reasons why this kind of RBAC control is a UI-only feature (and can't be easily made into "execution isolation" - while "Role" separation makes sense as UI-only feature (Admin can access different functionality than User via UI).

I personally think introducing groups of DAGs is only really useful (and secure) if we will be able to isolate DAG groups not only via "UI" capabilities but also via "DAG writing" capabilities. I think the reason why you want to introduce this in UI is tha there is something that you want to prevent people doing between the groups (viewing, running, etc.). I think the main reason to give UI access for group is to give it to the same people who can write the dags. For me pretty much all UI "Can see DAGs in this group" access should map 1-1 to "Can write DAGS in this group". It's a bit simplistic view, I can - of course - imagine other reasons for permission separation, but to be honest - giving people the way to "view and operate" DAGs via UI "per-group", and not doing the same for "write code in DAGs" gives a very bad false-sense of security. And I think we should only introduce it when we can do it "full-stack". Otherwise people will be tempted to use current airflow as multi-tenant, where it is in fact not multi-tenant (yet!).

Currently the only way to have multi-tenancy is to have separate airflow instances for each tenant. Full stop. By introducing DAG grouping without going full-stack we give people wrong impression that the story is different. It's not.

Since this has not been added into 2.1, when are we expecting 3.0 to be launched? Can't we expect this to be merged sooner?

I think it will take a while. If you feel the need that you MUST separate the DAGs in the UI, I'd strongly encourage you to implement single instance od Airflow per "group". This is far more robust, already proven, It provides better isolation between groups - actually it's infinitely bigger - comparing to trying to run them in single instance which provides no isolation. Better yet, it provides also capability of isolating the workloads between the different groups (which is completely not possible if you run them in single instance and one group can submit million tasks influencing the other). It can be nicely integrated into any authentication mechanism you have in the way that only users belonging to certain group can have access to each instance. And it can be easily automated to create/drop new "instances" automatically for different groups (been there, done that with nice Terraform templates) - even if you need to manage many of those.

Yeah it uses more resources, but you can also reuse a lot of those resources if you put those instances in auto-scaling K8S clusters. And you can create separate schema for each "group" in the same database to utilise your DB server better.

potiuk commented 2 years ago

We have now Multi-tenancy effort in progress which I am leading. And while the first two AIPs that are very draft (but will soon be updated) do not address this final granularity yet, they pave the way for the third AIP that is going to address also this use case. Since this is in on the roadmap and planned and part of the bigger multi-tenancy effort. @xuejunteo @sudarshan2906 @alete89 @malthe if you are interested in joining the effort, please join Airlfow Devlist and possibly #sig-multitenancy slack channel and take part in the discussions:

You can find last meeting mintues and even recording of the meeting where we discussed the plans for Multitenancy. https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-1%3A+Improve+Airflow+Security

Note that this is for a long haul - full implementation of the multitenancy (and even discussion on the AIPs) will take quite some time (several months/half a year at the least as this is a really big set of features to make it possible)

vumdao commented 2 years ago

Hope this will be released in Airflow 2.3 !

subinjp commented 2 years ago

@potiuk Could you give an update on this feature? We would like to manage DAGs from multiple aws account users in one Airflow environment. Is there any way to achieve this with existing features of airflow?

potiuk commented 2 years ago

In About 10 hours I have a talk about it at the Airflow Summit that is happening this week. Just watch it - aither live or after. .https://www.crowdcast.io/e/airflowsummit2022

hendrix04 commented 2 years ago

I just wanted to throw a perspective onto this thread.

I am not necessarily looking for multi-tenancy. My use case is more of, "I want people to be able to create cross dependency DAGs, but not be able to re-run a dag that they didn't create."

Maybe there is a better way of handling that scenario that I am not realizing?

potiuk commented 2 years ago

Check DAG level permissions here @hendrix04 https://airflow.apache.org/docs/apache-airflow/stable/security/access-control.html#dag-level-permissions and see if it is good for you.

If not then in a few months we will likely have a discussion about more sophissticated (but hopefully simpler) dag UI permission model.

hendrix04 commented 2 years ago

@potiuk, I saw that and was looking into it.

It didn't look like that works for partial matches (either via regex or substring match). If it does then it would 100% work for my use case. If not then I think it would be a bit too onerous to try and create a rule for every DAG in the system.

If it doesn't work on partial matches, I can likely figure out a different route for my use case. Ultimately though, I wanted to share a user story with you for why someone would want to group DAGs for FGAC reasons but not have a full multi-tenant experience.

potiuk commented 2 years ago

Sure. I think it would be great to start the discussion at the Devlist. I think we need to start discussing finally on how we are going to approach user management in the future and having well rounded input with use cases and requirements will be a good start to think a) whether we can do it within the current RBAC or extending it or b) whether it should be replaced with something else and c) how it fits into future multi-tenancy. We have not discussed that part at all yet in the multitenancy "stream" but I think indeed use case and requirement work there can be done in parallel to multi-tenancy and they might be completely run in parallel as long as we know there is convergence at some point of time, That would actually be great if we can define some short term approach with long term vision to connect both streams.

mtsadler-branch commented 1 year ago

@potiuk I wrote some python logic for dag owners and dag tags, and wrapped them in a DAG to run every 30 mins.

IDK how this relates to the longterm solution..

owner DAG-level permissions asumes the User.email matches the dag.owner field.

tags DAG-level permissions piggybacked off owner logic, by defining a "global" TAG_EMAIL_MAPPING, that specifies which emails are tied to which tags.

NB: In the example DAGs I linked, I haven't implemented any logic for removing access when an owner is replaced or when TAG_EMAIL_MAPPING removes an email from the list.

potiuk commented 1 year ago

The long-term solution is being discussed in https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-56+Extensible+user+management devlist discussion here: https://lists.apache.org/thread/ck8dsj5w82lvr0cpwr4wlptmydqwnsqc

So you might chime in there if you feel like it. It's far more generic and abstracts away from owner/email mapping. In fact User is going to be completely removed and outsourced to external Auth system. So it's rather different.

javierpe27 commented 7 months ago

Hi everyone! Is there an eta for this feature to be in general available?

kaxil commented 4 months ago

AIP-56 was released, @potiuk @vincbeck was this addressed or should we leave this open?

vincbeck commented 4 months ago

I would say no but ... AIP-56 has been released so now all the authentication and authorization logic is no longer in core Airflow. However, this feature request is still valid for the fab provider. If someone really want this feature in the FAB auth manager, this is possible.

But ... the direction we are aiming in Airflow 3 is to no longer use FAB auth manager, and from the description of this feature request, I'd say implementing such thing in the FAB auth manager would be messy. This is one of the reason we want to go away from the FAB auth manager: flexibility. So the best solution for this feature request would be to create a new auth manager leveraging a tool that support such feature. And if this tool is generic and simple enough, we could even use it as default auth manager in Airflow 3

potiuk commented 4 months ago

I would agree here with @vincbeck. Implementing "Really Simple Auth Manager" where we do not have all the complexity of FAB but we have 2-3 predefined and NON CONFIGURABLE roles and (for example) very simple team setup with multi-team support could be the right way to implement this one in a better way.

geronimo-iia commented 1 month ago

Hello,

From a finops perspective, we share a single airflow instance for multiple teams. With an ABAC/RBAC system, our goal is just to drive the UI, like displaying dags belonging to a team, etc... nothing more.

In fact, we use a manifest file that defines the group. And for each group, we have a list of approved repositories and tags.

With this file, we implemented a Dag policy that checks:

We integrated the AWS Identity Center using Fab Auth and therefore using a policy store. Actually, since we can just implement the RBAC restriction, we need to list all deployed dags to set these permissions... a nightmare :p We just want to add the tags attributes on the entity and filter on it (like managed service does on aws). It's still in the poc stage on our dev stations, nothing works for the moment... :(

I agree that it would be easier to manage this segregation with one airflow instance per team. But it depends on your organization. For us, we would have to manage 100 instances with 5 to 10 users per team... Each instance with its web server, redis, workers, scheduler etc... too expensive.

Unless we can just have one web server per team that shares the same scheduler, worker, etc. but how to organize the log and tag folders that must be common?

Personally, I think it's a good thing to have taken the authentication code out of the airflow database. This allows easier integration with business constraints.

Maybe for the ABAC/RBAC part we should do the same, and use something like https://github.com/casbin/pycasbin

My 2 cents :)

Regards Jerome

potiuk commented 1 month ago

Maybe for the ABAC/RBAC part we should do the same, and use something like https://github.com/casbin/pycasbin My 2 cents :)

Yes this is exactly the plan. We evaluate casbin, keycloack and one other thing (can't remember) for Airflow 3 and Auth Manager for multi-team access in the future (Airflow 3) will leverage this - rather than FAB RBAC