argoproj / argo-workflows

Workflow Engine for Kubernetes
https://argo-workflows.readthedocs.io/
Apache License 2.0
14.94k stars 3.19k forks source link

Compare to Apache Airflow #849

Closed elgalu closed 4 years ago

elgalu commented 6 years ago

Forgive my ignorance, but could you summarise how this project compares with Apache Airflow?

divideby0 commented 6 years ago

I'm relatively new to this tool myself, but some initial observations based on my experience:

Similarities

Differences

Edit: fixed a typo

divideby0 commented 6 years ago

Also, to unpack the "heterogenous runtimes" piece a bit further, Airflow has a huge list of "Operators" with support for other runtimes like Bash, Spark, Hive, etc. But the business logic for Operators themselves are all written in Python:

https://airflow.apache.org/code.html?highlight=operators

And many of them may have some environmental dependencies that you'll need to configure outside Airflow's setup to get working.

They do seem to have a DockerOperator, which probably provides many of the same facilities as Argo for scheduling docker executions on a single host, but I'm not certain it comes with all the same facilities that Kubernetes offers for managing and scheduling containerized workloads (e.g. pod abstractions, config maps, secrets management, centralized logging, host node selectors, affinity and anti-affinity, etc.).

jessesuen commented 6 years ago

Thanks @divideby0, I think you described the differences better than what I could have, given my limited knowledge of Airflow. Some additional points:

Airflow natively schedules steps to run in a Kubernetes cluster, potentially across several hosts

I would highlight this as a similarity. Argo only works in the context kubernetes where each step is a kubernetes pod. Thus it integrates very deeply into a kubernetes environment, utilizing nearly all of the features in a k8s pod spec (e.g. secrets/configmap mounts, volumes, resource limits, pod affinity, etc...). Scheduling of pods is deferred to kubernetes, and will run on whatever host k8s decides to schedule the pod (obeying any affinity rules set in the step).

I should point out that we created an example of an argo workflow which actually utilizes Airflow operators, since we understand the desire to leverage the huge library of Airflow operators that have been built up over time, but done in a more k8s centric way:

https://github.com/argoproj/data-pipeline

divideby0 commented 6 years ago

Thanks Jesse! To clarify, I meant to say Argo natively schedules steps to run on a Kubernetes cluster. I don't believe standalone Airflow has native Kubernetes support yet. That was a typo on my part.

vimox-shah commented 5 years ago

what is the learning curve for Argo compare to airflow? and as a beginner what challenges we might face?

pierorex commented 5 years ago

I'd like to add that since version 1.10.0, Airflow provides a Kubernetes Executor which allows scheduling jobs directly as Kubernetes pods.

anoasis commented 5 years ago

@pierorex it looks like Kubernetes Executor is spawning a single pod for a simple job using KubernetesPodOperator. How about scheduling long-running Service or StatefulSet? How does airflow handle it natively?

pierorex commented 5 years ago

@anoasis I don't think Airflow should be used for those types of tasks. IMO, Airflow tasks should run for a limited time, up to a couple of days maybe. I don't think they're supposed to be running as a service that is highly available for other apps to use.

orcaman commented 5 years ago

I would add that at least with the version of Argo we work with, most of the work is done via the CLI, because the Argo CLI is great, and because Argo's UI wasn't that great compared to Airflow's UI.

antvalencia commented 5 years ago

This short video, partial to Argo, addresses this question here:

https://www.youtube.com/watch?v=oXPgX7G_eow&t=5m57s

qins commented 4 years ago

For the most common used work flow language The Common Workflow Language (CWL):

argo: no support Airflow: support CWL

elgalu commented 4 years ago

@qins do you mean https://github.com/Barski-lab/cwl-airflow or that it naively supports it?

qins commented 4 years ago

@elgalu cwl-airflow