Closed eladkal closed 7 months ago
Tested https://github.com/apache/airflow/pull/36861, connection ID and impersonation chain are now properly passed. ✅
Test #36752 and it works as expected.
Verified https://github.com/apache/airflow/pull/36892, https://github.com/apache/airflow/pull/36946 and https://github.com/apache/airflow/pull/36894. will continue work on the rest
All my changes work ! :tada:
Validate changes in
I plan to check later other my changes
Validate changes in: https://github.com/apache/airflow/pull/36911 https://github.com/apache/airflow/pull/36905 https://github.com/apache/airflow/pull/36663 https://github.com/apache/airflow/pull/36934 https://github.com/apache/airflow/pull/36603 https://github.com/apache/airflow/pull/36489 https://github.com/apache/airflow/pull/36533 https://github.com/apache/airflow/pull/36491 https://github.com/apache/airflow/pull/36532 https://github.com/apache/airflow/pull/36530 https://github.com/apache/airflow/pull/36908
Hi!
https://github.com/apache/airflow/pull/36828 works as expected
Tested my changes, they look good.
36922 tested. It is working for configmaps mounted as volume but not for configmaps mounted as environment variable. I was troubleshooting the issue and it appears that I will need to make ‘env_from’ templated not ‘configmaps’ attribute. Any inputs from k8 operator experts? Also please let me know how do I proceed with the fix, a new issue or just open new PR with the fix?
Opened #37001 with the fix and unit tested
Opened https://github.com/apache/airflow/pull/37001 with the fix and unit tested
Cool. Yep. that's ok. In Airlfow PR
is the unit of work
- not issue, issue is mostly advisory and can be missing or there can be several PRs referring to the same issue, some of the PRs being incomplete. I understand that this is is a new feature, so it is not a regression?
Tested https://github.com/apache/airflow/pull/36171. Work as expected.
Env: Amazon MWAA v2.7.2 Instance.
Steps: (1) Started a fresh MWAA instance
(2) Added to requirements amazon: 8.17.0rc1, also bump constrains:
apache-airflow-providers-amazon==8.17.0rc1
boto3==1.33.0
botocore==1.33.0
s3transfer==0.8.0
redshift_connector==2.0.918
(3) Created connection athena_default
with extras
{
"work_group": "primary",
"region_name": "us-east-1"
}
(4) Created connection athena_assumed_role
with extras
{
"work_group": "primary",
"region_name": "us-east-1",
"role_arn": "arn:aws:iam::xxxxxxxxx:role/athena-access"
}
(5) Test DAG
create_sql_table1 = SQLExecuteQueryOperator(
task_id="create_sql_table1",
conn_id='athena_default',
sql='SELECT 1;SELECT 2;SELECT 3;SELECT 4',
split_statements=True,
dag=dag,
)
create_sql_table2 = SQLExecuteQueryOperator(
task_id="create_sql_table2",
conn_id='athena_assumed_role',
sql='SELECT 1;SELECT 2;SELECT 3;SELECT 4',
split_statements=True,
dag=dag,
)
create_sql_table1 >> create_sql_table2
Opened #37001 with the fix and unit tested
Cool. Yep. that's ok. In Airlfow
PR
is theunit of work
- not issue, issue is mostly advisory and can be missing or there can be several PRs referring to the same issue, some of the PRs being incomplete. I understand that this is is a new feature, so it is not a regression?
Yes, this is a new feature
Yes, this is a new feature
So I will not issue RC2 as this release has many new features that we shouldnt hold. The fix will be released in the next wave
logger_name
not implemented in LoggingMixin
class constructor yetError which raised on Airflow 2.6.3
[2024-01-25, 08:29:34 UTC] Task failed with exception
Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.10/site-packages/airflow/providers/slack/operators/slack.py", line 258, in execute
self._method_resolver(
File "/home/airflow/.local/lib/python3.10/site-packages/airflow/providers/slack/operators/slack.py", line 254, in _method_resolver
return self.hook.send_file
File "/usr/local/lib/python3.10/functools.py", line 981, in __get__
val = self.func(instance)
File "/home/airflow/.local/lib/python3.10/site-packages/airflow/providers/slack/operators/slack.py", line 82, in hook
return SlackHook(
File "/home/airflow/.local/lib/python3.10/site-packages/airflow/providers/slack/hooks/slack.py", line 117, in __init__
super().__init__(logger_name=extra_client_args.pop("logger_name", None))
TypeError: LoggingMixin.__init__() got an unexpected keyword argument 'logger_name'
I guess this affects only providers which explicitly propagate, something like super().__init__(logger_name=kwargs.pop("logger_name", None))
:
cc: @eladkal @hussein-awala
Do simple test for migration AWS operators/sensors/triggerers to base classes. Works fine, however for most the classes I've check only the instantiating objects
I guess this affects only providers which explicitly propagate, something like super().init(logger_name=kwargs.pop("logger_name", None)):
mongo: 3.6.0rc1 segment: 3.5.0rc1 slack: 8.6.0rc1 common.sql: 1.11.0rc1
Yes. Looks like the #34964 that introduced the logger_name had a hidden back-compatibility issue and we simply can't pass loggger_name for airflow < 2.8.0 . I guess we should remove these providers and only pass logger_name
from our providers conditionally (and add pre-commit for that).
Smth like:
if packaging.version.parse(packaging.version.parse(airflow_version).base_version) < packaging.version.parse(
"2.8.0"
):
super().__init__()
else:
super().__init__(logger_name=kwargs.pop("logger_name", None))
Or it might be something like that for avoid parsing version
extra_kwargs = {}
if logger_name := kwargs.pop("logger_name", None):
extra_kwargs["logger_name"] = logger_name
super().__init__(**extra_kwargs)
The first approach - the difference in the first case is that you can use same DAG / Operator for different Airlfow versions. The second one depends on how you use it - so for example you would not be able to have operator with logger_name that will work both pre- and post- 2.8.0 (unless you do version check in the operator)
My proposed fix it is just for avoid provide logger_name
implicitly even if it not defined. If we would like to check version than we need to make fixes in all providers.
In general usage users can't use/define attributes which not available into the BaseOperator / BaseHook for particular version.
For example users can't define schedule="@once"
and expect that it should work into the Airflow 2.3. The same here if user define logger_name
in Airflow < 2.8 then well it just won't work because feature only available in Airflow 2.8.
My proposed fix it is just for avoid provide logger_name implicitly even if it not defined. If we would like to check version than we need to make fixes in all providers.
I understand - but the thing is that our Hooks are mostly used internally by Operators and both Hooks and opertors are relased in providers, so there is a risk that someone will add an Operator that will initialize Hook with logger_name (which might be then exposed by or even hard-coded in the operator) . By implementing a protection against it in Hook, we allow such change to happen without waiting for min-airflow >= 2.8.0.
In other words - our code in Airflow that uses hooks - will not be able to use the logger_name
in any way if we do it the 2nd way. We will not be able to release an Operator which will adds the option of specifying it - you will only be able to specify the `logger_name" directly in the DAG code were Hooks are used directly or when users would like to develop custom operators and they are 100% sure the operator will be used in airflow 2.8.0+. Maybe that's what we want to do, but I think it is limiting. So for example if someone develops custom provider, that might use (say) common.sql DBApiHook. They will have to add "apache-airflow>=2.8.0" if their Operator will pass "logger_name" to the hook.
The first proposal will work in all cases. It will only propagate logger_name to Base Hook in Airflow 2.8.0 - and will set no limitations on the users of the Hook to know they are run in Airflow 2.8.0 environment. You might just write custom operator (or we can implement it in our operators) to specify logger_name and it will work, regardless of what "airflow version" the operator is installed at. So in the above example - if somoene would like to create their own provider and use "DBApiHook" they will be able to create an operator that will use it it and pass "logger_name" = "mylogger" to it (and they will not have to add apache-airlfow>= 2.8.0
in their provider.
Then we need to add this fix into the all providers affected by https://github.com/apache/airflow/pull/36675 (all?) or revert it for now. Depend on what is faster. In any cases we need to prepare rc2 for all providers I guess
@eladkal asked me to take over from here as he is flying tonight. I thought a bit about it and agree with @Taragolis that reverting #36675 is the best way to approach this - including re-releasing RC2 with accelerated voting (and implementing better support for logger_name) for the next wave.
@potiuk would re-releasing RC2 include cncf.Kubernetes provider? @eladkal said in his last comment for my PR #37001 that he will not release rc2 but include it in the next wave. With this recent change of rc2 re-releasing, would that be possible to reconsider and include #37001 in this re-releasing or not possible? I have been trying to get it approved by the listed reviewers but no luck so far. Please advise
@potiuk would re-releasing RC2 include cncf.Kubernetes provider? @eladkal said in his last comment for my PR #37001 that he will not release rc2 but include it in the next wave. With this recent change of rc2 re-releasing, would that be possible to reconsider and include #37001 in this re-releasing or not possible? I have been trying to get it approved by the listed reviewers but no luck so far. Please advise
Yep. Included it.
@potiuk would re-releasing RC2 include cncf.Kubernetes provider? @eladkal said in his last comment for my PR #37001 that he will not release rc2 but include it in the next wave. With this recent change of rc2 re-releasing, would that be possible to reconsider and include #37001 in this re-releasing or not possible? I have been trying to get it approved by the listed reviewers but no luck so far. Please advise
Yep. Included it.
Thanks a lot!
Hey Everyone - the issue is updated with new RC candidates and I am calling for another vote in a moment. Let's continue testing here!
I kept the status from previous tests BTW
Tested #36922 and #37001, working as expected
cc: @potiuk
Body
I have a kind request for all the contributors to the latest provider packages release. Could you please help us to test the RC versions of the providers?
The guidelines on how to test providers can be found in
Verify providers by contributors
Let us know in the comment, whether the issue is addressed.
Those are providers that require testing as there were some substantial changes introduced:
Provider airbyte: 3.6.0rc2
Provider alibaba: 2.7.2rc2
__init__
inanalyticdb_spark.py
(#36911): @romsharon98Provider amazon: 8.17.0rc2
is_authorized_dag
in AWS auth manager (#36619): @vincbeck[aws] cloudwatch_task_handler_json_serializer
not set (#36851): @TaragolisAwsEcsExecutor
(#36179): @syedahsnProvider apache.beam: 5.6.0rc2
Provider apache.druid: 3.8.0rc2
Provider apache.hive: 6.4.2rc1
__init__
inhive-stats
(#36905): @romsharon98Provider apache.spark: 4.7.1rc1
Provider atlassian.jira: 2.5.1rc1
atlassian-python-api
limitation (#36841): @Taragolisatlassian-python-api
to <3.41.6 (#36815): @TaragolisProvider celery: 3.5.2rc2
Provider cncf.kubernetes: 7.14.0rc2
KubernetesPodOperator
(#37001): @vizeitProvider cohere: 1.1.2rc1
Provider common.sql: 1.10.1rc1
Provider databricks: 6.1.0rc2
Provider dbt.cloud: 3.6.0rc2
Provider discord: 3.6.0rc2
Provider elasticsearch: 5.3.2rc1
Provider exasol: 4.4.2rc2
Provider google: 10.14.0rc2
BigQueryToSqlBaseOperator
fromBigQueryToPostgresOperator
(#36663): @romsharon98__init__
inauto_ml.py
(#36934): @romsharon98__init__
inSFTPToGCSOperator
(#36603): @romsharon98__init__
in dataproc (#36489): @romsharon98parquet_row_group_size
inBaseSQLToGCSOperator
(#36817): @renzepost__init__
inBigQueryToPostgresOperator
operator (#36491): @romsharon98Provider hashicorp: 3.6.2rc1
raise_on_deleted_version=True
toread_secret_version
in Hashicorp operator (#36532): @romsharon98Provider http: 4.9.0rc2
Provider mongo: 3.6.0rc2
mongo_conn_id
argument into the MongoHook constructor (#36896): @TaragolisProvider mysql: 5.5.2rc2
Provider odbc: 4.4.1rc2
Provider openlineage: 1.5.0rc2
Provider pagerduty: 3.6.1rc1
Provider papermill: 3.6.1rc2
__init__
inpapermill.py
(#36530): @romsharon98Provider pinecone: 1.1.2rc1
pinecone-client
to <3.0 (#36818): @TaragolisProvider presto: 5.4.1rc2
Provider salesforce: 5.6.2rc1
Provider slack: 8.6.0rc2
client.files_upload_v2
in Slack Provider (#36757): @TaragolisProvider snowflake: 5.3.0rc2
Provider tableau: 4.4.1rc1
Provider telegram: 4.3.1rc1
Provider trino: 5.6.1rc2
Provider weaviate: 1.3.1rc2
__init__
in weaviate (#36908): @romsharon98Provider yandex: 3.8.0rc2
All users involved in the PRs: @romsharon98 @kacpermuda @dabla @renzepost @vizeit @varuntwr @AchimGaedkeLynker @hamedhsn @Taragolis @Lee-W @ferruzzi @sasidharan-rathinam @hussein-awala @chrishronek @syedahsn @vatsrahul1001 @arjunanan
Committer