Closed elgalu closed 4 years ago
I'm relatively new to this tool myself, but some initial observations based on my experience:
Edit: fixed a typo
Also, to unpack the "heterogenous runtimes" piece a bit further, Airflow has a huge list of "Operators" with support for other runtimes like Bash, Spark, Hive, etc. But the business logic for Operators themselves are all written in Python:
https://airflow.apache.org/code.html?highlight=operators
And many of them may have some environmental dependencies that you'll need to configure outside Airflow's setup to get working.
They do seem to have a DockerOperator, which probably provides many of the same facilities as Argo for scheduling docker executions on a single host, but I'm not certain it comes with all the same facilities that Kubernetes offers for managing and scheduling containerized workloads (e.g. pod abstractions, config maps, secrets management, centralized logging, host node selectors, affinity and anti-affinity, etc.).
Thanks @divideby0, I think you described the differences better than what I could have, given my limited knowledge of Airflow. Some additional points:
Airflow natively schedules steps to run in a Kubernetes cluster, potentially across several hosts
I would highlight this as a similarity. Argo only works in the context kubernetes where each step is a kubernetes pod. Thus it integrates very deeply into a kubernetes environment, utilizing nearly all of the features in a k8s pod spec (e.g. secrets/configmap mounts, volumes, resource limits, pod affinity, etc...). Scheduling of pods is deferred to kubernetes, and will run on whatever host k8s decides to schedule the pod (obeying any affinity rules set in the step).
I should point out that we created an example of an argo workflow which actually utilizes Airflow operators, since we understand the desire to leverage the huge library of Airflow operators that have been built up over time, but done in a more k8s centric way:
Thanks Jesse! To clarify, I meant to say Argo natively schedules steps to run on a Kubernetes cluster. I don't believe standalone Airflow has native Kubernetes support yet. That was a typo on my part.
what is the learning curve for Argo compare to airflow? and as a beginner what challenges we might face?
I'd like to add that since version 1.10.0, Airflow provides a Kubernetes Executor which allows scheduling jobs directly as Kubernetes pods.
@pierorex it looks like Kubernetes Executor is spawning a single pod for a simple job using KubernetesPodOperator. How about scheduling long-running Service or StatefulSet? How does airflow handle it natively?
@anoasis I don't think Airflow should be used for those types of tasks. IMO, Airflow tasks should run for a limited time, up to a couple of days maybe. I don't think they're supposed to be running as a service that is highly available for other apps to use.
I would add that at least with the version of Argo we work with, most of the work is done via the CLI, because the Argo CLI is great, and because Argo's UI wasn't that great compared to Airflow's UI.
This short video, partial to Argo, addresses this question here:
For the most common used work flow language The Common Workflow Language (CWL):
argo: no support Airflow: support CWL
@qins do you mean https://github.com/Barski-lab/cwl-airflow or that it naively supports it?
@elgalu cwl-airflow
Forgive my ignorance, but could you summarise how this project compares with Apache Airflow?