getindata / kedro-airflow-k8s

Kedro Plugin to support running pipelines on Kubernetes using Airflow.
https://kedro-airflow-k8s.readthedocs.io
Apache License 2.0
29 stars 11 forks source link

DagRun pod for pipeline (QUESTION) #70

Open teaglebuilt opened 3 years ago

teaglebuilt commented 3 years ago

Instead of launching a pod per node....could i launch a pod for the entire dagrun with the localexecutor? If so, I was curious if anyone had tried this?

I have tasks that are too quick to spin up pods, i was hoping to have a pod per dag/ dag run.

em-pe commented 3 years ago

Hi @teaglebuilt. Thanks for the question, it's a good one!

We've noticed spawning pod for every node is an overkill, but what we've considered is using kedro node tags to allow grouping node executions within a single pod.

I'm keeping this ticket open until we provide a follow up issue with implementation guidelines.

teaglebuilt commented 3 years ago

@em-pe yep, and spawning per node is also a great solution that we can use as well. Although some nodes might be a process finishing faster than it would take to spin up a pod...but still want the visibility of that node, so you do not want to group it into another execution process.

For that reason, I have looked at either telling the Kubernetes Pod Operator to do the same logic as the reattach on restart for directing a task back to the same pod for execution.

Or, also the possibility of mixing the logic of the Sub Dag Operator with the Kubernetes Pod Operator, which I anticipate will create bottle necks with the scheduler. Although, the scheduler can scale now, so it migiht not be a problem...