apache / airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
https://airflow.apache.org/
Apache License 2.0
36.27k stars 14.08k forks source link

Status of testing Providers that were prepared on January 26, 2024 #36948

Closed eladkal closed 7 months ago

eladkal commented 7 months ago

Body

I have a kind request for all the contributors to the latest provider packages release. Could you please help us to test the RC versions of the providers?

The guidelines on how to test providers can be found in

Verify providers by contributors

Let us know in the comment, whether the issue is addressed.

Those are providers that require testing as there were some substantial changes introduced:

Provider airbyte: 3.6.0rc2

All users involved in the PRs: @romsharon98 @kacpermuda @dabla @renzepost @vizeit @varuntwr @AchimGaedkeLynker @hamedhsn @Taragolis @Lee-W @ferruzzi @sasidharan-rathinam @hussein-awala @chrishronek @syedahsn @vatsrahul1001 @arjunanan

Committer

m1racoli commented 7 months ago

Tested https://github.com/apache/airflow/pull/36861, connection ID and impersonation chain are now properly passed. ✅

arjunanan6 commented 7 months ago

Test #36752 and it works as expected.

Lee-W commented 7 months ago

Verified https://github.com/apache/airflow/pull/36892, https://github.com/apache/airflow/pull/36946 and https://github.com/apache/airflow/pull/36894. will continue work on the rest

potiuk commented 7 months ago

All my changes work ! :tada:

Taragolis commented 7 months ago

Validate changes in

I plan to check later other my changes

romsharon98 commented 7 months ago

Validate changes in: https://github.com/apache/airflow/pull/36911 https://github.com/apache/airflow/pull/36905 https://github.com/apache/airflow/pull/36663 https://github.com/apache/airflow/pull/36934 https://github.com/apache/airflow/pull/36603 https://github.com/apache/airflow/pull/36489 https://github.com/apache/airflow/pull/36533 https://github.com/apache/airflow/pull/36491 https://github.com/apache/airflow/pull/36532 https://github.com/apache/airflow/pull/36530 https://github.com/apache/airflow/pull/36908

VladaZakharova commented 7 months ago

Hi!

36473 works as well, thank you :)

moiseenkov commented 7 months ago

36276 works as expected

AchimGaedkeLynker commented 7 months ago

https://github.com/apache/airflow/pull/36828 works as expected

Lee-W commented 7 months ago

Tested https://github.com/apache/airflow/pull/36586, https://github.com/apache/airflow/pull/36578, https://github.com/apache/airflow/pull/36550, https://github.com/apache/airflow/pull/36680, https://github.com/apache/airflow/pull/36658, https://github.com/apache/airflow/pull/36749

renzepost commented 7 months ago

36817 works as expected

hussein-awala commented 7 months ago

Tested my changes, they look good.

vizeit commented 7 months ago

36922 tested. It is working for configmaps mounted as volume but not for configmaps mounted as environment variable. I was troubleshooting the issue and it appears that I will need to make ‘env_from’ templated not ‘configmaps’ attribute. Any inputs from k8 operator experts? Also please let me know how do I proceed with the fix, a new issue or just open new PR with the fix?

vizeit commented 7 months ago

36922 tested. It is working for configmaps mounted as volume but not for configmaps mounted as environment variable. I was troubleshooting the issue and it appears that I will need to make ‘env_from’ templated not ‘configmaps’ attribute. Any inputs from k8 operator experts? Also please let me know how do I proceed with the fix, a new issue or just open new PR with the fix?

Opened #37001 with the fix and unit tested

potiuk commented 7 months ago

Opened https://github.com/apache/airflow/pull/37001 with the fix and unit tested

Cool. Yep. that's ok. In Airlfow PR is the unit of work - not issue, issue is mostly advisory and can be missing or there can be several PRs referring to the same issue, some of the PRs being incomplete. I understand that this is is a new feature, so it is not a regression?

flolas commented 7 months ago

Tested https://github.com/apache/airflow/pull/36171. Work as expected.

Env: Amazon MWAA v2.7.2 Instance.

Steps: (1) Started a fresh MWAA instance

(2) Added to requirements amazon: 8.17.0rc1, also bump constrains:

apache-airflow-providers-amazon==8.17.0rc1
boto3==1.33.0
botocore==1.33.0
s3transfer==0.8.0
redshift_connector==2.0.918

(3) Created connection athena_default with extras

{
  "work_group": "primary",
  "region_name": "us-east-1"
}

(4) Created connection athena_assumed_role with extras

{
  "work_group": "primary",
  "region_name": "us-east-1",
  "role_arn": "arn:aws:iam::xxxxxxxxx:role/athena-access"
}

(5) Test DAG

create_sql_table1 = SQLExecuteQueryOperator(
            task_id="create_sql_table1",
            conn_id='athena_default',
            sql='SELECT 1;SELECT 2;SELECT 3;SELECT 4',
            split_statements=True,
            dag=dag,
        )
create_sql_table2 = SQLExecuteQueryOperator(
            task_id="create_sql_table2",
            conn_id='athena_assumed_role',
            sql='SELECT 1;SELECT 2;SELECT 3;SELECT 4',
            split_statements=True,
            dag=dag,
        )
create_sql_table1 >> create_sql_table2
vizeit commented 7 months ago

Opened #37001 with the fix and unit tested

Cool. Yep. that's ok. In Airlfow PR is the unit of work - not issue, issue is mostly advisory and can be missing or there can be several PRs referring to the same issue, some of the PRs being incomplete. I understand that this is is a new feature, so it is not a regression?

Yes, this is a new feature

eladkal commented 7 months ago

Yes, this is a new feature

So I will not issue RC2 as this release has many new features that we shouldnt hold. The fix will be released in the next wave

Taragolis commented 7 months ago

Error which raised on Airflow 2.6.3

[2024-01-25, 08:29:34 UTC] Task failed with exception
Traceback (most recent call last):
  File "/home/airflow/.local/lib/python3.10/site-packages/airflow/providers/slack/operators/slack.py", line 258, in execute
    self._method_resolver(
  File "/home/airflow/.local/lib/python3.10/site-packages/airflow/providers/slack/operators/slack.py", line 254, in _method_resolver
    return self.hook.send_file
  File "/usr/local/lib/python3.10/functools.py", line 981, in __get__
    val = self.func(instance)
  File "/home/airflow/.local/lib/python3.10/site-packages/airflow/providers/slack/operators/slack.py", line 82, in hook
    return SlackHook(
  File "/home/airflow/.local/lib/python3.10/site-packages/airflow/providers/slack/hooks/slack.py", line 117, in __init__
    super().__init__(logger_name=extra_client_args.pop("logger_name", None))
TypeError: LoggingMixin.__init__() got an unexpected keyword argument 'logger_name'

I guess this affects only providers which explicitly propagate, something like super().__init__(logger_name=kwargs.pop("logger_name", None)):

cc: @eladkal @hussein-awala

Taragolis commented 7 months ago

Do simple test for migration AWS operators/sensors/triggerers to base classes. Works fine, however for most the classes I've check only the instantiating objects

potiuk commented 7 months ago

I guess this affects only providers which explicitly propagate, something like super().init(logger_name=kwargs.pop("logger_name", None)):

mongo: 3.6.0rc1 segment: 3.5.0rc1 slack: 8.6.0rc1 common.sql: 1.11.0rc1

Yes. Looks like the #34964 that introduced the logger_name had a hidden back-compatibility issue and we simply can't pass loggger_name for airflow < 2.8.0 . I guess we should remove these providers and only pass logger_name from our providers conditionally (and add pre-commit for that).

Smth like:

if packaging.version.parse(packaging.version.parse(airflow_version).base_version) < packaging.version.parse(
    "2.8.0"
):
    super().__init__()
else:
     super().__init__(logger_name=kwargs.pop("logger_name", None))
Taragolis commented 7 months ago

Or it might be something like that for avoid parsing version

extra_kwargs = {}
if logger_name := kwargs.pop("logger_name", None):
    extra_kwargs["logger_name"] = logger_name
super().__init__(**extra_kwargs)
potiuk commented 7 months ago

The first approach - the difference in the first case is that you can use same DAG / Operator for different Airlfow versions. The second one depends on how you use it - so for example you would not be able to have operator with logger_name that will work both pre- and post- 2.8.0 (unless you do version check in the operator)

Taragolis commented 7 months ago

My proposed fix it is just for avoid provide logger_name implicitly even if it not defined. If we would like to check version than we need to make fixes in all providers.

In general usage users can't use/define attributes which not available into the BaseOperator / BaseHook for particular version. For example users can't define schedule="@once" and expect that it should work into the Airflow 2.3. The same here if user define logger_name in Airflow < 2.8 then well it just won't work because feature only available in Airflow 2.8.

potiuk commented 7 months ago

My proposed fix it is just for avoid provide logger_name implicitly even if it not defined. If we would like to check version than we need to make fixes in all providers.

I understand - but the thing is that our Hooks are mostly used internally by Operators and both Hooks and opertors are relased in providers, so there is a risk that someone will add an Operator that will initialize Hook with logger_name (which might be then exposed by or even hard-coded in the operator) . By implementing a protection against it in Hook, we allow such change to happen without waiting for min-airflow >= 2.8.0.

In other words - our code in Airflow that uses hooks - will not be able to use the logger_name in any way if we do it the 2nd way. We will not be able to release an Operator which will adds the option of specifying it - you will only be able to specify the `logger_name" directly in the DAG code were Hooks are used directly or when users would like to develop custom operators and they are 100% sure the operator will be used in airflow 2.8.0+. Maybe that's what we want to do, but I think it is limiting. So for example if someone develops custom provider, that might use (say) common.sql DBApiHook. They will have to add "apache-airflow>=2.8.0" if their Operator will pass "logger_name" to the hook.

The first proposal will work in all cases. It will only propagate logger_name to Base Hook in Airflow 2.8.0 - and will set no limitations on the users of the Hook to know they are run in Airflow 2.8.0 environment. You might just write custom operator (or we can implement it in our operators) to specify logger_name and it will work, regardless of what "airflow version" the operator is installed at. So in the above example - if somoene would like to create their own provider and use "DBApiHook" they will be able to create an operator that will use it it and pass "logger_name" = "mylogger" to it (and they will not have to add apache-airlfow>= 2.8.0 in their provider.

Taragolis commented 7 months ago

Then we need to add this fix into the all providers affected by https://github.com/apache/airflow/pull/36675 (all?) or revert it for now. Depend on what is faster. In any cases we need to prepare rc2 for all providers I guess

potiuk commented 7 months ago

@eladkal asked me to take over from here as he is flying tonight. I thought a bit about it and agree with @Taragolis that reverting #36675 is the best way to approach this - including re-releasing RC2 with accelerated voting (and implementing better support for logger_name) for the next wave.

vizeit commented 7 months ago

@potiuk would re-releasing RC2 include cncf.Kubernetes provider? @eladkal said in his last comment for my PR #37001 that he will not release rc2 but include it in the next wave. With this recent change of rc2 re-releasing, would that be possible to reconsider and include #37001 in this re-releasing or not possible? I have been trying to get it approved by the listed reviewers but no luck so far. Please advise

potiuk commented 7 months ago

@potiuk would re-releasing RC2 include cncf.Kubernetes provider? @eladkal said in his last comment for my PR #37001 that he will not release rc2 but include it in the next wave. With this recent change of rc2 re-releasing, would that be possible to reconsider and include #37001 in this re-releasing or not possible? I have been trying to get it approved by the listed reviewers but no luck so far. Please advise

Yep. Included it.

vizeit commented 7 months ago

@potiuk would re-releasing RC2 include cncf.Kubernetes provider? @eladkal said in his last comment for my PR #37001 that he will not release rc2 but include it in the next wave. With this recent change of rc2 re-releasing, would that be possible to reconsider and include #37001 in this re-releasing or not possible? I have been trying to get it approved by the listed reviewers but no luck so far. Please advise

Yep. Included it.

Thanks a lot!

potiuk commented 7 months ago

Hey Everyone - the issue is updated with new RC candidates and I am calling for another vote in a moment. Let's continue testing here!

potiuk commented 7 months ago

I kept the status from previous tests BTW

listik commented 7 months ago

36813 tested and working ✅

vizeit commented 7 months ago

Tested #36922 and #37001, working as expected

cc: @potiuk

potiuk commented 7 months ago

Thank you everyone. Providers are released!

I invite everyone to help improve providers for the next release, a list of open issues can be found here.