Open raphaelauv opened 1 year ago
Absolutely! You are a little early - not a full release yet 😄
I am working on the docs and a few finishing touches for the full 1.0 release here, let me know if you have comments/suggestions.
We'll likely announce it more broadly soon
"Is this just a KPO Helper?" Yes, but... Airflow is just fancycron when you think about it 😁
Summary
The Isolation Provider provides the
IsolatedOperator
and theisolationctl
CLI. It provides the capacity to run any Airflow Operator in an isolated fashion.Why use
IsolatedOperator
?
- Run a different version of an underlying library, between separate teams on the same Airflow instance, even between separate tasks in the same DAG
- Run entirely separate versions of Python for a task
- Keep "heavy" dependencies separate from Airflow
- Run an Airflow task in a completely separate environment - or on a server all the way across the world.
- Run a task "safely" - separate from the Airflow instance
- Run a task with dependencies that conflict more easily
- Do all of the above while having unmodified access to (almost) all of 'normal' Airflow - operators, XCOMs, logs, deferring, callbacks.
What does the
isolationctl
provide?
- the
isolationctl
gives you an easy way to manage "environments" for yourIsolatedOperator
When shouldn't you use
IsolatedOperator
?
- If you can use un-isolated Airflow Operators, you still should use un-isolated Airflow Operators.
- 'Talking back' to Airflow is no longer possible in an
IsolatedOperator
. You cannot doVariable.set
within anIsolatedOperator(operator=PythonOperator)
, nor can you query the Airflow Metadata Database.
I still don't understand what new features give this operator compared to the KPO in case of wrapping the pythonoperator.
And for the all other operators it sounds like it the same than using the KubernetesExecutor.
So I don't understand what case this operator answer that is not already possible.
You can, of course, do this with the vanilla KPO - it uses KPO. It is definitely just a fancy wrapper on top of some substrate. In the future it could also use DockerOperator
or others as substrates.
What this adds is:
1) it does some pre-parsing with an easy interface to set the environment up to run the KPO via the IsolatedOperator
- a user who knows Airflow but nothing about KPO or k8s should be able to run that.
2) The PostIsolationHook
re-assembles everything on the other side. KPO normally can't easily run Airflow Operators, and Airflow Operators don't normally easily run with context/conns/vars without a direct connection to the Airflow metadata db. This makes all that simpler. It's vaguely similar to the magic that @task.kubernetes
does.
3) The CLI makes it easy to build environments on top of an existing one. e.g. a different version of a provider, or underlying python library, or some additional key or configuration - whatever is needed to be separate from the original host airflow monolith environment 🤷
Happy to chat more on the OSS slack or anything. I'd also love feedback in general, or if there's things that you think this should encompass or not encompass
tl;dr:
thanks for answering , I'm still confused ( look like the readme is going to be the most complicate part of this project , cause I'm a power user of airflow , and still not sure of how and when this operator is a good trade-off )
I've been looking to the example and I don't see usage of your tl;dr
could you share an example of using the IsolatedOperator with a pythonoperator using pandas==0.16.0
and a IsolatedOperator with a SimpleHttpOperator using apache-airflow-providers-http==2.0.2
also is IsolatedOperator working with auto-completion of the args of the wrapped operator ?
also there is more general question : is this a good idea to offer this helper that will let users create pasta code ( where some code is using some version libraries inside code using other version libraries ) and not following separation of concern between scheduling and execution.
also is IsolatedOperator working with auto-completion of the args of the wrapped operator ?
If you mean in an IDE - I'm not sure how you could achieve that? Would it be a union of all possible operators that could be wrapped? I don't know how to provide dynamic type hints or docs to an IDE
Examples have been updated.
That was my expectation , then would you agree to add a disclaimer in the readme
Something like pros/cons ?
I still don't see how the isolated operator custom dependencies version
Look like this is going to run the code in the context of the current setup of the airflow worker ( in case of celeryexecutor )
hey, I don't understand when this operator is useful .
Could you share a pragmatic example, thanks :+1: