astronomer / apache-airflow-providers-isolation

Apache License 2.0
4 stars 0 forks source link

is this provider just a KPO helper ? #13

Open raphaelauv opened 1 year ago

raphaelauv commented 1 year ago

hey, I don't understand when this operator is useful .

Could you share a pragmatic example, thanks :+1:

fritz-astronomer commented 1 year ago

Absolutely! You are a little early - not a full release yet 😄
I am working on the docs and a few finishing touches for the full 1.0 release here, let me know if you have comments/suggestions. We'll likely announce it more broadly soon

"Is this just a KPO Helper?" Yes, but... Airflow is just fancycron when you think about it 😁


Summary

The Isolation Provider provides the IsolatedOperator and the isolationctl CLI. It provides the capacity to run any Airflow Operator in an isolated fashion.

Why use IsolatedOperator?

  • Run a different version of an underlying library, between separate teams on the same Airflow instance, even between separate tasks in the same DAG
  • Run entirely separate versions of Python for a task
  • Keep "heavy" dependencies separate from Airflow
  • Run an Airflow task in a completely separate environment - or on a server all the way across the world.
  • Run a task "safely" - separate from the Airflow instance
  • Run a task with dependencies that conflict more easily
  • Do all of the above while having unmodified access to (almost) all of 'normal' Airflow - operators, XCOMs, logs, deferring, callbacks.

What does the isolationctl provide?

  • the isolationctl gives you an easy way to manage "environments" for your IsolatedOperator

When shouldn't you use IsolatedOperator?

  • If you can use un-isolated Airflow Operators, you still should use un-isolated Airflow Operators.
  • 'Talking back' to Airflow is no longer possible in an IsolatedOperator. You cannot do Variable.set within an IsolatedOperator(operator=PythonOperator), nor can you query the Airflow Metadata Database.
raphaelauv commented 1 year ago

I still don't understand what new features give this operator compared to the KPO in case of wrapping the pythonoperator.

And for the all other operators it sounds like it the same than using the KubernetesExecutor.

So I don't understand what case this operator answer that is not already possible.

fritz-astronomer commented 1 year ago

You can, of course, do this with the vanilla KPO - it uses KPO. It is definitely just a fancy wrapper on top of some substrate. In the future it could also use DockerOperator or others as substrates.

What this adds is: 1) it does some pre-parsing with an easy interface to set the environment up to run the KPO via the IsolatedOperator - a user who knows Airflow but nothing about KPO or k8s should be able to run that. 2) The PostIsolationHook re-assembles everything on the other side. KPO normally can't easily run Airflow Operators, and Airflow Operators don't normally easily run with context/conns/vars without a direct connection to the Airflow metadata db. This makes all that simpler. It's vaguely similar to the magic that @task.kubernetes does. 3) The CLI makes it easy to build environments on top of an existing one. e.g. a different version of a provider, or underlying python library, or some additional key or configuration - whatever is needed to be separate from the original host airflow monolith environment 🤷

Happy to chat more on the OSS slack or anything. I'd also love feedback in general, or if there's things that you think this should encompass or not encompass

fritz-astronomer commented 1 year ago

tl;dr:

raphaelauv commented 1 year ago

thanks for answering , I'm still confused ( look like the readme is going to be the most complicate part of this project , cause I'm a power user of airflow , and still not sure of how and when this operator is a good trade-off )

I've been looking to the example and I don't see usage of your tl;dr

could you share an example of using the IsolatedOperator with a pythonoperator using pandas==0.16.0

and a IsolatedOperator with a SimpleHttpOperator using apache-airflow-providers-http==2.0.2

also is IsolatedOperator working with auto-completion of the args of the wrapped operator ?

also there is more general question : is this a good idea to offer this helper that will let users create pasta code ( where some code is using some version libraries inside code using other version libraries ) and not following separation of concern between scheduling and execution.

fritz-astronomer commented 1 year ago

also is IsolatedOperator working with auto-completion of the args of the wrapped operator ?

If you mean in an IDE - I'm not sure how you could achieve that? Would it be a union of all possible operators that could be wrapped? I don't know how to provide dynamic type hints or docs to an IDE

Examples have been updated.

raphaelauv commented 1 year ago

That was my expectation , then would you agree to add a disclaimer in the readme

Something like pros/cons ?

raphaelauv commented 1 year ago

I still don't see how the isolated operator custom dependencies version

https://github.com/astronomer/apache-airflow-providers-isolation/blob/main/isolation/example_dags/isolation_provider_example_dag.py#L34

Look like this is going to run the code in the context of the current setup of the airflow worker ( in case of celeryexecutor )