apache / airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
https://airflow.apache.org/
Apache License 2.0
36.85k stars 14.25k forks source link

Status of testing Providers that were prepared on October 30, 2021 #19328

Closed potiuk closed 2 years ago

potiuk commented 2 years ago

Body

I have a kind request for all the contributors to the latest provider packages release. Could you help us to test the RC versions of the providers and let us know in the comment, if the issue is addressed there.

Providers that need testing

Those are providers that require testing as there were some substantial changes introduced:

Provider amazon: 2.4.0rc1

Providers that do not need testing

Those are providers that were either doc-only or had changes that do not require testing.

Thanks to all who contributed to those provider's release

@sreenath-kamath @jarfgit @frankcash @deedmitrij @enima2684 @uranusjr @peter-volkov @mariotaddeucci @potatochip @malthe @msumit @dimberman @jameslamb @Brooke-white @GuidoTournois @minu7 @ashb @shadrus @ignaski @eladkal @ephraimbuddy @eskarimov @JavierLopezT @keze @josh-fell @raphaelauv @blag @Aakcht @guotongfei @SamWheating @danarwix @subkanthi @alexbegg @bhavaniravi @tnyz @Goodkat @fredthomsen @SayTen @kaxil @lwyszomi @ReadytoRocc @baolsen @30blay @nathadfield @xuan616 @RyanSiu1995

Committer

raphaelauv commented 2 years ago

for cncf.kubernetes: 2.1.0rc1 it's all good

add more information to PodLauncher timeout error (#17953) -> is working

   File "/usr/local/lib/python3.9/site-packages/airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py", line 374, in execute
    raise AirflowException(f'Pod Launching failed: {ex}')
airflow.exceptions.AirflowException: Pod Launching failed: Pod took longer than 120 seconds to start. Check the pod events in kubernetes to determine why.
[2021-10-30, 14:21:41 UTC] {local_task_job.py:154} INFO - Task exited with return code 1
[2021-10-30, 14:21:41 UTC] {local_task_job.py:264} INFO - 0 downstream tasks scheduled from follow-on schedule check

Add more type hints to PodLauncher (#18928) > just typing , so OK

pavelhlushchanka commented 2 years ago

I think #18733 was released in 2.3.0

raphaelauv commented 2 years ago

for google: 6.1.0rc1

Google provider catch invalid secret name (#18790) -> is working

[2021-10-30 14:56:31,254] {logging_mixin.py:104} INFO - Running <TaskInstance: a_nice_dag.also_run_this 2021-10-20T00:00:00+00:00 [running]> on host 0dbece050201
[2021-10-30 14:56:31,797] {secret_manager_client.py:100} ERROR - Google Cloud API Call Error (InvalidArgument): Invalid secret ID XXXXXXXXXXXXXX-variable-toto.tata.
                Only ASCII alphabets (a-Z), numbers (0-9), dashes (-), and underscores (_)
                are allowed in the secret ID.

[2021-10-30 14:56:31,858] {taskinstance.py:1300} INFO - Exporting the following env vars:
...

the error about secrets with a non valid char is now catch

potiuk commented 2 years ago

I think #18733 was released in 2.3.0

Good spot @codenamestif ! It turned out that after last month's release of 2.3.0 as rc2 I set the "2.3.0" tag wrongly to rc1 - so the changes added between rc1 and rc2 were duplicated in the issue and changelog (and I missed that when I prepared it today). Those are the duplicated issues:

I already corrected the tag, the issue and I will also correct the Changelog (and in docs those entries will be missing), but unless there are other, more serious changes that make it rc1, it will remain in the package README (and tecchnically those changes ARE in 2.4.0 as well as 2.3.0), In the future packages changelog will corrected.

Sorry everyone involved for spamming!

Aakcht commented 2 years ago

Tested #18854 - all good.

18331 is already present in 2.1.1 ( it was tested in #18638) , so I don't think hdfs 2.1.1rc1 should be present here.

JavierLopezT commented 2 years ago

17397 and #18764 were already tested in the last wave

On Sat, 30 Oct 2021 at 20:39, Aakcht @.***> wrote:

Tested #18854 https://github.com/apache/airflow/pull/18854 - all good.

18331 https://github.com/apache/airflow/pull/18331 is already present

in 2.1.1 ( it was tested in #18638 https://github.com/apache/airflow/issues/18638) , so I don't think hdfs 2.1.1rc1 should be present here.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/apache/airflow/issues/19328#issuecomment-955575296, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACWQK7CHD5D4NJINYNXVWSLUJQ3WBANCNFSM5HBDLQLQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

mariotaddeucci commented 2 years ago

18027 is already present in 2.3.0, tested in #18638

18671 and #18819 are all good

18844 got errors using RedshiftSQLHook when executing multiple commands into a prepared statement. I'll fix it.

enima2684 commented 2 years ago

18990 tested and working

enima2684 commented 2 years ago

18872 I tested this PR and it is not working as expected.

The DockerSwarmOperator never returns and keep hanging. I used for the test the following Job :

    task = DockerSwarmOperator(
        task_id="task",
        image="python:3.9",
        enable_logging=True,
        tty=True,
        command=["echo", "hello world !"]
    )
potiuk commented 2 years ago

18331 is already present in 2.1.1 ( it was tested in #18638) , so I don't think hdfs 2.1.1rc1 should be present here.

Right - the issue generation script has a bug apparently in this case - hdfs is not being released. I Will fix it. Removed it from the issue.

potiuk commented 2 years ago

17397 and #18764 were already tested in the last wave

Correct - already removed (that was amazon's later release and wrong tag set - already corrected :)

potiuk commented 2 years ago

18844 got errors using RedshiftSQLHook when executing multiple commands into a prepared statement. I'll fix it.

How serious / how much of a regression change is it @mariotadeucci ?

Goodkat commented 2 years ago

17850 was already tested within 2.0.0rc2 release

15016 (comment)

potiuk commented 2 years ago

| #18872 I tested this PR and it is not working as expected.

@RyanSiu1995 can you please double check if you have the same problem ?

potiuk commented 2 years ago

17850 was already tested within 2.0.0rc2 release

Correct - exasol is not being released (same bug as with hdfs when generating issue only). I will fix it. Sorry for spam.

potiuk commented 2 years ago

I removed the few other providers which suffered from the same issue - sorry :(

fredthomsen commented 2 years ago

#18847 and #18752 for samba: 3.0.1rc1 are both good.

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ On Saturday, October 30th, 2021 at 9:49 AM, Jarek Potiuk @.***> wrote:

Body

I have a kind request for all the contributors to the latest provider packages release. Could you help us to test the RC versions of the providers and let us know in the comment, if the issue is addressed there.

Providers that need testing

Those are providers that require testing as there were some substantial changes introduced:

Provider amazon: 2.4.0rc1

Provider apache.beam: 3.1.0rc1

Provider apache.cassandra: 2.1.0rc1

Provider apache.hdfs: 2.1.1rc1

Provider apache.hive: 2.0.3rc1

Provider apache.livy: 2.1.0rc1

Provider asana: 1.1.0rc1

Provider cncf.kubernetes: 2.1.0rc1

Provider databricks: 2.0.2rc1

Provider docker: 2.3.0rc1

Provider elasticsearch: 2.1.0rc1

Provider exasol: 2.0.1rc1

Provider facebook: 2.1.0rc1

Provider google: 6.1.0rc1

Provider hashicorp: 2.1.1rc1

Provider jdbc: 2.0.1rc1

Provider jenkins: 2.0.3rc1

Provider microsoft.azure: 3.3.0rc1

Provider microsoft.psrp: 1.0.1rc1

Provider mongo: 2.2.0rc1

Provider neo4j: 2.0.2rc1

Provider openfaas: 2.0.0rc1

Provider pagerduty: 2.1.0rc1

Provider papermill: 2.1.0rc1

Provider postgres: 2.3.0rc1

Provider salesforce: 3.3.0rc1

Provider samba: 3.0.1rc1

Provider sftp: 2.2.0rc1

Provider slack: 4.1.0rc1

Provider snowflake: 2.3.0rc1

Provider sqlite: 2.0.1rc1

Provider ssh: 2.3.0rc1

Provider tableau: 2.1.2rc1

Provider trino: 2.0.2rc1

Provider yandex: 2.1.0rc1

Providers that do not need testing

Those are providers that were either doc-only or had changes that do not require testing.

Thanks to all who contributed to those provider's release

@.(https://github.com/sreenath-kamath) @.(https://github.com/jarfgit) @.(https://github.com/frankcash) @.(https://github.com/deedmitrij) @.(https://github.com/enima2684) @.(https://github.com/uranusjr) @peter-volkov @.(https://github.com/mariotaddeucci) @.(https://github.com/potatochip) @.(https://github.com/john-jac) @.(https://github.com/malthe) @.(https://github.com/msumit) @.(https://github.com/dimberman) @.(https://github.com/jameslamb) @.(https://github.com/Brooke-white) @.(https://github.com/codenamestif) @.(https://github.com/GuidoTournois) @.(https://github.com/minu7) @.(https://github.com/ashb) @.(https://github.com/shadrus) @.(https://github.com/ignaski) @.(https://github.com/ferruzzi) @.(https://github.com/eladkal) @.(https://github.com/ephraimbuddy) @.(https://github.com/eskarimov) @.(https://github.com/JavierLopezT) @.(https://github.com/keze) @.(https://github.com/josh-fell) @.(https://github.com/anaynayak) @.(https://github.com/raphaelauv) @.(https://github.com/blag) @.(https://github.com/Aakcht) @.(https://github.com/guotongfei) @.(https://github.com/SamWheating) @.(https://github.com/potiuk) @.(https://github.com/ron-damon) @danarwix @.(https://github.com/subkanthi) @.(https://github.com/alexbegg) @.(https://github.com/bhavaniravi) @.(https://github.com/tnyz) @.(https://github.com/Goodkat) @.(https://github.com/fredthomsen) @.(https://github.com/SayTen) @.(https://github.com/kaxil) @.(https://github.com/lwyszomi) @.(https://github.com/ReadytoRocc) @.(https://github.com/baolsen) @.(https://github.com/30blay) @.(https://github.com/nathadfield) @.(https://github.com/xuan616) @.(https://github.com/RyanSiu1995)

Committer

  • I acknowledge that I am a maintainer/committer of the Apache Airflow project.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

GuidoTournois commented 2 years ago

I have validated that both changes for pagerduty work as intended!

Brooke-white commented 2 years ago

18447 tested and working

anaynayak commented 2 years ago

18807 tested and verified both single + multiple s3 prefix matches.

josh-fell commented 2 years ago

Tested and verified #19052, #19062, and #19323. Thanks for organizing this release Jarek!

tnyz commented 2 years ago

tested #18676 and working

potiuk commented 2 years ago

Good progress so far :)!

mariotaddeucci commented 2 years ago

18844 got errors using RedshiftSQLHook when executing multiple commands into a prepared statement. I'll fix it.

How serious / how much of a regression change is it @mariotadeucci ?

@potiuk Happend on S3ToRedshiftOperator with specific configuration. By using "UPSERT" or "REPLACE" is generate an sql block with multiple queries. The RedshiftSQLHook don't support execute multiple queries in a single call of execute. To fix it just need to convert the single query string to a list of queries. Fix is available on PR #19358.

eskarimov commented 2 years ago

Tested #19048, works correctly :)

bhavaniravi commented 2 years ago

Tested #19276 MongoSensor picking up the new db param

mik-laj commented 2 years ago

Provider facebook: 2.1.0rc1 Align the default version with Facebook business SDK (#18883): @RyanSiu1995

It is a breaking change, so it should be in a new release we should bump major version.

Provider google: 6.1.0rc1 Replace default api_version of FacebookAdsReportToGcsOperator (#18996): @eladkal

It is breaking change also.

Provider microsoft.azure: 3.3.0rc1 update azure cosmos to latest version (#18695): @eladkal

I am not sure here, but when we changed the minimum requirements of the google libraries, we always bumped the major version. Airflow is a thin layer between libraries and user, so breaking changes propagate to Airflow very easily.

potiuk commented 2 years ago

Provider facebook: 2.1.0rc1 Align the default version with Facebook business SDK (#18883): @RyanSiu1995

It is a breaking change, so it should be in a new release.

Provider google: 6.1.0rc1 Replace default api_version of FacebookAdsReportToGcsOperator (#18996): @eladkal

It is breaking change also.

Provider microsoft.azure: 3.3.0rc1 update azure cosmos to latest version (#18695): @eladkal

I am not sure here, but when we changed the minimum requirements of the google libraries, we always bumped the major version. Airflow is a thin layer between libraries and user, so breaking changes propagate to Airflow very easily.

Thanks @mik-laj I will take a look and decide

SamWheating commented 2 years ago

regarding https://github.com/apache/airflow/pull/18992

I've tested this change in our environment with the DataflowCreateJavaJobOperator. I haven't had a chance to set up test jobs for the other operators but based on the similarities in implementation I think that this fix should be applicable to all of them.

This PR introduced a change in behaviour, but I don't think it should be considered a breaking change as it is simply returning to the original / documented behaviour.

potiuk commented 2 years ago

Provider facebook: 2.1.0rc1 Provider google: 6.1.0rc1 microsoft.azure 3.3.0rc1

Hey @mik-laj - thanks for raising your concenrs. i looked closer, and I do not see those changes as really "breaking" (or at least not really breaking enough and under our control to justify major version change (but I am open to hear to arguments if others disagree).

Facebook

Facebook API does not follow SemVer. AT ALL.

They mostly introduce new features and bugfixes when they bump the version. More interestingly, they even CHANGE behaviour of old versions of APIs when this behaviour is working for quite some time in the new versions.

Here are the changes to insights features we are using for example:

Insights Applies to v9.0+. Will apply to all versions on May 9, 2021. IG User follower_count values now align more closely with their corresponding values displayed in the Instagram app. In > addition, follower_count now returns a maximum of 30 days of data instead of 2 years.

and:

Ads Insights API Updated date_preset parameter Applies to v10.0+.

The lifetime parameter (date_preset=lifetime) is disabled and replaced with date_preset=maximum, which can be used to retrieve a maximum of 37 months of data. The API will return an error when requests contain date ranges beyond the 37-month window. For v9.0 and lower, there will be no change in functionality until May 25, 2021. At that time, date_preset=maximum will be enabled and any lifetime calls will default to maximum and return only 37 months of data.

and:

Deprecation of Store Visit Metrics Applies to 11.0+. Will apply to all versions on Sept. 6, 2021.

This means that the version 6.0 of "insights" we were using so far has anyway changed behaviour on Sep 6th this year to match the current 11+ behaviour :scream:

So from what we see here, Facebook has chosen a "move fast break things" approach for their APIs. I assume Facebook users know it and I think the approach we took is a good one - we should not really try to keep compatibilty with version vN of Facebook, because even they don't do/recommend it (and we won't be able to actually do that because they can anyhow change the behaviour without us knowing it). So keeping the approach where by default API version is "latest" seems like a good approach and I do not see it really breaking anything.

Cosmos

Provider microsoft.azure: 3.3.0rc1 update azure cosmos to latest version (#18695): @eladkal

I looked at the changes again, and I do not see any breaking change. We indeed sometimes (but not always) bumped major versions for our providers when underlying libraries changed but only when they changed the Airflow API of that provider (format of data returned and similar). The sheer fact of library upgrading it's version does not automatically invalidates and forces bump to new version of all the applications and libraries using it. In this particular case, we simply adapted our implementation so that the API remained the same (the breaking change which affected us was simply throwing a different exception - in our case we caught it and returned None, and this behaviour remained so there is no reason to bump the major version here.

potiuk commented 2 years ago

Closing this one. As described in https://lists.apache.org/thread/6tzsq6xkm5r1q6pqtyq5yhvr5p1jqd13 I am releasing the wave now and we are removing the Amazon Provider from that release due to Redshift regression (and I will release fixed version soon). Thanks @mariotaddeucci for testing and spotting the problem (it was not obvious in the initial change and it's great you tested this case!)

Thanks everyone who helped!