apache / airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
https://airflow.apache.org/
Apache License 2.0
35.37k stars 13.81k forks source link

Add a CRD-sync on k8s #19250

Open cdmikechen opened 2 years ago

cdmikechen commented 2 years ago

Description

Airflow use git-sync to pull DAGs from git repository, I'm thinking about doing it in a more cloud native way. So I thought of using k8s custom resource definition to do this. We can migrate DAG, files and other contents to custom resource definition. Based on the operator by a container, we can instantly synchronize them to each pod of Airflow.

Use case/motivation

I have created a project on GitHub to implement this idea https://github.com/cdmikechen/airflow-dag-operator. It will start a synchronization service with Quarkus Operator on each airflow pod to synchronize the DAG/files into the DAG folder.

Related issues

No response

Are you willing to submit a PR?

Code of Conduct

boring-cyborg[bot] commented 2 years ago

Thanks for opening your first issue here! Be sure to follow the issue template!

potiuk commented 2 years ago

My first point is - cool way of doing it.

But I think it does not fit in Airflow code. It requires dependency to Quarkus (as I understand it) and if we make it part of the helm chart (this is where it fits at very best as this is the only link we have with K8S deployment), it introduces a new "option" to use DAG syncing which I think moves us a bitt too close to be K8S-tied. It also lacks testing - all the code we have is heavily tested automatically - in all the variants and configurations. For one our Helm Chart is fully tested with unit and integration tests and any code incorporated to Airflow will have to be equally well tested..

Also look at our talk with @kaxil : https://airflowsummit.org/sessions/2021/airflow-loves-kubernetes/ - while Airflow Loves Kubernetes, K8S is only only of many deployment options for us. Using GitSync is a bit more "agnostic" and - unless there is a very good reason - why the CRD-way could better, I see no reason why it should be incorporated by the community.. Unless there are some good arguments, the fact that something is "cloud native", does not automatically means "it's better" - it's a bit misconception, and in Airflow's world, being "cloud native" bears the conotation of ("we can only target part of our users this way") - which oftem means it's not worth the effort, unless it REALLY brings some beneffits.

I think it better fits the https://airflow.apache.org/ecosystem/ page to be honest - if someone wants to use it, they could (and it's as easy as adding a link to your Repo there via PR).

@jedcunningham @kaxil @dimberman - WDYT?

cdmikechen commented 2 years ago

@potiuk Thanks for your advise.

There is something similar to using Git to synchronize DAG with k8s Operator. They all use tools outside the airflow project to do this. Without some production level testing and running examples, it really makes people feel unreliable.

When I first deployed airflow on k8s, I thought that it would be an extra task to deploy a resource service independent of k8s to synchronize DAG. Moreover, not all k8s clusters create Git services separately. In my opinion, for a k8s based developer / user, using CRD for DAG / file synchronization on k8s may be more convenient than using Git for synchronization.

In essence, this approach does not change the strategy of using DAG, but it can give some users who do not have git repository another choice to deploy on the cloud.

potiuk commented 2 years ago

In essence, this approach does not change the strategy of using DAG, but it can give some users who do not have git repository another choice to deploy on the cloud.

This is all fine and I agree with giving people more options. I do not argue with that (actually - I do not even argue at all- i ask others in the community for opinion and present my own).

However there is a huge difference how the options for the user are provided;

1) if it is provided as ecosystem/external - the person/organisation who provides it provides all the support, user questions, solving problems. customisations

2) if it is provided by the Apache Airflow Communtiy - it is the community that provides the support, release working software, fixes problems and generally supports the users.

Both are viable options. Both has happened in the past for various parts of Airflow (and various ecosystem components). There is no "one is better than the other" - they simply have different characteristics. And sometimes even those who provide the "option" or "alternatives" keep it as "ecosystem" part becaus then they want to keep their own pace of changes and develop it futther (example - user-community chart).

For the Apache Airflow community - following the Apache Way http://www.apache.org/theapacheway/ - any new option like that to support makes more combinations to test, develop and make sure it works fine for the users. That's why anything that we accept as contribution MUST at least have automated tests, and we have to be sure it brings value that offsets the cost connected with maintaining it.

Also when software brings dependencies we must be sure we are ok with the dependencies - licence wise, stabiity wise, future-compatibility and maintenance wise. In this particular case Quarkus (which I first hear of) is a Java-based dependency which we have almost no experience with as a comunity. Any fixes/problems there might require skills that are a bit outside of our domain.

I do not know if those are serious problems or and blockers or not - I am just pointing out the areas that are potentially problematic, and that such "Add new option for users" might have some serious implications that we should consider.

Accepting a "feature" or "code" is more often than you think "liability" rather than "asset".

I am not telling it's the case here, Certainly if we go in this direction, comprehensive set of unit tests is absolute MUST - without it, it is "no-go" no matter what.

So I think we should wait for others to chime-in (but it might take days or weeks depending on how busy they are) - so crertainly do not expect immediate actions here.