elyra-ai / elyra

Elyra extends JupyterLab with an AI centric approach.
https://elyra.readthedocs.io/en/stable/
Apache License 2.0
1.86k stars 344 forks source link

[WIP] Airflow package catalog connector for Airflow 2.x wheel, make import of all core operators possible, parsing changes #3208

Open shalberd opened 10 months ago

shalberd commented 10 months ago

fixes #2124

@nanaones unsure if airflow package catalog connector was a feature in 2021 already, it is now ... still, even on package catalog connector initial setup and import, i.e. from

https://archive.apache.org/dist/airflow/2.6.2/apache_airflow-2.6.2-py3-none-any.whl

when importing the wheel file containing the airflow 2.x core operators, import is incomplete. For example, the BashOperator and Email Operator are missing. Looked like the detection logic needed fixes, something that @ianonavy fixed in a fork.

the message is as follows in Elyra container:

I 2024-01-05 10:52:38.508 ElyraApp] Analysis of 'https://archive.apache.org/dist/airflow/2.6.2/apache_airflow-2.6.2-py3-none-any.whl' completed. Located 9 operator classes in 4 Python scripts.
[W 2024-01-05 10:52:38.521 ServerApp] Operator 'BranchPythonOperator' associated with identifier '{'airflow_package': 'apache_airflow-2.6.2-py3-none-any.whl', 'file': 'airflow/operators/python.py'}' does not have an __init__ function. Skipping...

package catalog connector code needs fixes, this PR is based on work in a fork by @ianonavy for package catalog connector, so that bash operator and in general the provided base airflow operators work.

Concept behind the airflow package catalog connector in elyra main source: https://github.com/elyra-ai/elyra/tree/main/elyra/pipeline/airflow/package_catalog_connector

work done in fork outside community elyra but never tested and discussed for far here

I am including this change here so it makes it into community Elyra.

Cause as it is now, only an incomplete subset of operators is made available by the airflow package catalog connector with Airflow 2.x wheel file.

Bildschirmfoto 2024-01-12 um 17 48 41

The fix in this PR commit, BaseOperator reference location being compatible with Airflow 2.x

https://airflow.apache.org/docs/apache-airflow/2.6.2/_api/airflow/models/baseoperator/index.html#

It's been in this new location ranging all the way back to 2.0.0.

https://airflow.apache.org/docs/apache-airflow/2.0.0/_api/airflow/models/baseoperator/index.html

change in this PR leads to all operators being available finally in Elyra pipeline editor:

https://airflow.apache.org/docs/apache-airflow/2.6.2/operators-and-hooks-ref.html

after I do the changes, I get a different log on Elyra start when evaluating the wheel file, looking much better, more operator classes (16 instead of 9) detected.

[I 2024-01-12 22:25:00.524 ElyraApp] Analysis of ''https://archive.apache.org/dist/airflow/2.6.2/apache_airflow-2.6.2-py3-none-any.whl'' completed. Located 16 operator classes in 11 Python scripts.
[W 2024-01-12 22:25:00.568 ServerApp] Operator 'BaseBranchOperator' associated with identifier '{'airflow_package': 'apache_airflow-2.6.2-py3-none-any.whl', 'file': 'airflow/operators/branch.py'}' does not have an __init__ function. Skipping...
[W 2024-01-12 22:25:00.571 ServerApp] Operator 'EmptyOperator' associated with identifier '{'airflow_package': 'apache_airflow-2.6.2-py3-none-any.whl', 'file': 'airflow/operators/empty.py'}' does not have an __init__ function. Skipping...
[W 2024-01-12 22:25:00.587 ServerApp] Operator 'BranchPythonOperator' associated with identifier '{'airflow_package': 'apache_airflow-2.6.2-py3-none-any.whl', 'file': 'airflow/operators/python.py'}' does not have an __init__ function. Skipping...
[W 2024-01-12 22:25:00.592 ServerApp] Operator 'LatestOnlyOperator' associated with identifier '{'airflow_package': 'apache_airflow-2.6.2-py3-none-any.whl', 'file': 'airflow/operators/latest_only.py'}' does not have an __init__ function. Skipping...

The 3 operators now not imported and usable are ones that are pipeline-related, but not via the mechanisms offered by KubernetesPodOperator and IBM pipelines, so they and should be skipped from a use case perspective, and also because they init themselves not on their own. Besides that, the EmptyOperator is kind of a placeholder dummy operator anyways, no functionality.

let's check the GUI after the wheel file import and the change in this PR

I can now for example see and use the BashOperator, with Airflow 2.x.

Bildschirmfoto 2024-01-12 um 23 31 57

What changes were proposed in this pull request?

no code refactoring, just a small operator detection logic change for Airflow 2.0.0 and higher

It's been in this new location ranging all the way back to 2.0.0.

https://airflow.apache.org/docs/apache-airflow/2.0.0/_api/airflow/models/baseoperator/index.html

Add package connector support for Airflow 2.x The check for subclasses of BaseOperator uses an outdated package name. This commit and PR adds the new one from Airflow 2.

How was this pull request tested?

no changes in any existing Elyra unit tests. However, flow of importing Airflow 2.x core operators explained above, with result before and after the change in this PR. Import worked fine, operators visible in left palette of Elyra pipeline editor. Tested with: Airflow 2.6.2

Developer's Certificate of Origin 1.1

   By making a contribution to this project, I certify that:

   (a) The contribution was created in whole or in part by me and I
       have the right to submit it under the Apache License 2.0; or

   (b) The contribution is based upon previous work that, to the best
       of my knowledge, is covered under an appropriate open source
       license and I have the right under that license to submit that
       work with modifications, whether created in whole or in part
       by me, under the same open source license (unless I am
       permitted to submit under a different license), as indicated
       in the file; or

   (c) The contribution was provided directly to me by some other
       person who certified (a), (b) or (c) and I have not modified
       it.

   (d) I understand and agree that this project and the contribution
       are public and that a record of the contribution (including all
       personal information I submit with it, including my sign-off) is
       maintained indefinitely and may be redistributed consistent with
       this project or the open source license(s) involved.
shalberd commented 2 months ago

to do:

https://stackoverflow.com/questions/76067543/using-python-ast-to-get-the-value-from-keyword-of-a-context-manager

i.e. in https://github.com/elyra-ai/elyra/blob/main/elyra/pipeline/airflow/component_parser_airflow.py#L203

all operators from https://github.com/apache/airflow/blob/2.8.2/airflow/operators/bash.py

evaluate if AST parsing can be done

see also my comment at https://github.com/elyra-ai/elyra/issues/2124#issuecomment-1895321756