What is the difference with luigi?

pietermarsman commented 3 years ago

Hi there,

I'm considering mulitple tools to orchestrate a machine-learning training and deployment pipeline. Up until now I've been looking at Argo, Prefect, Luigi and Airflow extinsively.

I'm not sure what the added benefits of flyte are compared to e.g. luigi. Can you add more documentation about how flyte compares to other similiar workflow managers.

kumare3 commented 3 years ago

@pietermarsman thank you for the question, this is one of the things we will be adding soon. I will let the spotify folks @honnix / @kanterov / @narape - who are now using Flyte extensively answer in more detail. But, here is a quick summary IMO

Model: Luigi is a python based workflow engine that has data flow as a first class citizen. The scheduler runs locally in the same procces as the workflow. Thus there is not distributed fault-tolerance or hosted experience possible. Flyte is a specification based Workflow Engine. It borrows ideas from both Airflow and Luigi in terms of extensibility - you can easily add new extensions - like AirflowOperators. Also it is a dataflow based orchestrator, where it deeply understands the data flowing through the system. Big difference is, the workflows can be authored in any language and get converted to the common protobuf based specification and uploaded to the Flyte Service. From then on, the execution is done on a kubernetes cluster. You can visualize the dags and execute them from a UI or a CLI

Scale: Flyte is a distributed scheduler and schedules the pipelines using variety of backend plugins - on k8s (pods, containers, spark jobs, tensorflow training etc), on other hosted services like EMR, databricks, AWS Batch etc. The scheduler is fault tolerant and resilient to machine crashes etc. At Lyft we run more than million pipelines a month and have more than 10k unique pipelines.

Versioning: Flyte versions all the workflows and tasks (tasks are each individual execution unit in a Workflow, Workflow is the DAG). So you can go back in time to any execution and retrieve outputs from any execution and run it immediately (as long as the containers are available). All inputs and outputs are also cached and can be retrieved from one central API using the various clients or UI

Match the developers workflow: Users can write a single task, execute a single task, once happy combine multiple tasks into a workflow and then can have multiple schedules on a workflow. Developers can be notified on success / failure of a workflow/task.

Easy GitOps: At Lyft for every new commit to a repo, we build all the workflows and tasks in that repository. Thus making is possible refer back to the code. This does not mean you have to build containers, for quick iteration we have a fast-register mode - that makes it possible to iterate on the code directly from the laptop in matter of seconds

Catalog Caching and Lineage tracking: Every execution is recorded and intermediate steps recorded. Thus if the same execution - same version of code, identical inputs are observed (for deterministic algorithms), Flyte will re-use outputs from a previous execution. This makes it possible to fix bugs in a DAG, without having to redo all of the computation.

UI/CLI/SDKs: Python SDK or JAVA SDK can be used to author workflows and tasks Here are some examples of using the python SDK - https://flytecookbook.readthedocs.io/en/latest/auto_recipes/index.html The UI automatically generates a form to handle all inputs, because we natively understand the data you can also use the CLI to interact with all your workflows

Write your own plugins (Flytekit python or backend) Flytekit allows you to simply write new operators in python that users can use. But, you can also extend FlyteBackend to add platform level capabilities - this makes it possible to write global extensions within a company, which can be deployed without having to fix the libraries etc

Backend The scheduler and backend is all written in Golang

kumare3 commented 3 years ago

Also Please join our slack channel here and feel free to ping me and we can have a longer discussion

kumare3 commented 3 years ago

Also look at https://flytecookbook.readthedocs.io/en/latest/ to see the new python programming model

honnix commented 3 years ago

@kumare3 Thanks for answering.

Luigi is a python based workflow engine that has data flow as a first class citizen. The scheduler runs locally in the same procces as the workflow.

Small corrections here.

If data flow here refers to Google Dataflow, Luigi doesn't do anything special to that; if data flow refers to a generic concept, Luigi is task centric and workflow is merely a DAG of tasks that can be traversed/deduced at runtime, and Luigi doesn't have workflow as a physical entity/model.
Luigi has an optional global scheduler that ensures the same task (identified by name + input values) is not executed more than once simultaneously. There is some doc talking about this: https://luigi.readthedocs.io/en/latest/central_scheduler.html

And yes, Luigi runs all tasks deduced from an entry task in the same process (or on the same computer to be more precise, because multiple processes could be forked by Luigi to run tasks in parallel).

kumare3 commented 3 years ago

@pietermarsman does this help?

kumare3 commented 3 years ago

@pietermarsman I am closing this issufor now. Thank you. Please re-open if you want any more clarifications

flyteorg / flyte

What is the difference with luigi? #660