apache / airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
https://airflow.apache.org/
Apache License 2.0
37.1k stars 14.3k forks source link

Status of testing Providers that were prepared on August 19, 2024 #41577

Closed eladkal closed 2 months ago

eladkal commented 2 months ago

Body

I have a kind request for all the contributors to the latest provider packages release. Could you please help us to test the RC versions of the providers?

The guidelines on how to test providers can be found in

Verify providers by contributors

Let us know in the comment, whether the issue is addressed.

Those are providers that require testing as there were some substantial changes introduced:

Provider amazon: 8.28.0rc1

All users involved in the PRs: @morokosi @mobuchowski @borismo @dirrao @phi-friday @Ghoul-SSZ @joaopamaral @BTeclaw @potiuk @vikramaditya91 @uzhastik @ambika-garg @Owen-CH-Leung @moiseenkov @ssilb4 @got686-yandex @kaxil @vincbeck @Le

Committer

moiseenkov commented 2 months ago

Hi,

41527, #41262 work as expected

sc250072 commented 2 months ago

Hi, since there haven’t been any recent updates for the Teradata Provider beyond version 2.5.0, could you explain why a new release version is needed?

https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpypi.org%2Fproject%2Fapache-airflow-providers-teradata%2F2.6.0rc1%2F&data=05%7C02%7CSATISH.CHINTHANIPPU%40teradata.com%7Cc798de64676b411f9b7a08dcc0279df7%7C9151cbaafc6b4f4889bb8c4a71982138%7C0%7C0%7C638596523614516170%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=uL1wmpzD4EDpAJZPPvDPVbqC%2BbuDSIhwniZPqKyMQ3M%3D&reserved=0

phi-friday commented 2 months ago
  1. 41356 works fine.

  2. 41358 I didn't use spark so I can't check, but given that _sql is simply an alias for sql, it should be fine.

  3. 41461 I couldn't find a good way to verify common.sql, but since we've only removed the part that generates a more detailed error message, it should be fine.

potiuk commented 2 months ago

Hi, since there haven’t been any recent updates for the Teradata Provider beyond version 2.5.0, could you explain why a new release version is needed?

According to our rules - periodically we bump all providers min-airflow version. https://github.com/apache/airflow/blob/main/PROVIDERS.rst#upgrading-minimum-supported-version-of-airflow -> then we relaease all providers with min-airflow version bumped - we also remove all pre-min-airflow backports, this allows to keep airflow providers free from back-compatibility issues.

potiuk commented 2 months ago

Checked that all my changes are in.

jx2lee commented 2 months ago

@eladkal

40008 works fine! (unittest & example run)

Owen-CH-Leung commented 2 months ago

@eladkal the ElasticSearchSQLHook is now working as expected.

image

vikramaditya91 commented 2 months ago

https://github.com/apache/airflow/pull/41256: @vikramaditya91

Works fine

kacpermuda commented 2 months ago

41494 tried to fix the OOM error in the scheduler that the OpenLineage can cause when generating a dag_tree from a huge DAG (related issue: #41505). It works but we've just got information about another production case where the scheduler went OOM with another complex DAG. There is a fix prepared in #41587 that will remove the dag_tree entirely so there will no more errors like this. I'd like to request an rc2 for OpenLineage provider (@eladkal) that will include that fix, as it is a bug that can cause some problems in bigger deployments.

joaopamaral commented 2 months ago

Tested https://github.com/apache/airflow/pull/40703 with both access_control formats and it's working fine:

image
eladkal commented 2 months ago

@kacpermuda I will exclude openlineage from this release

uzhastik commented 2 months ago

ydb provider works fine: https://github.com/apache/airflow/pull/41303

ambika-garg commented 2 months ago

Hi, #40356 work as expected

dirrao commented 2 months ago

Hi, https://github.com/apache/airflow/pull/41372 address the documentation changes. So, no functionality change.

BTeclaw commented 2 months ago

41150 - Works as expected - unit test + functional check, details below:

  1. Connection definition (different role and warehouse than is used on Operator definition) connection_definition
  2. DAG definition, mind the different warehouse, role and schema dag_definition
  3. Queries executed on a warehouse defined when declaring the SnowflakeSqlApiOperator and not the connection proper_warehouse
  4. DDL executing role is also properly forwarded through the SnowflakeSqlApiOperator proper_owner_of_table
perry2of5 commented 2 months ago

41142 passes test, but:

I noticed is that starting with 2.9.3 with microsoft-azure providers 10.3.0 the return value quits being put into XCOM (it is blank in the UI). Then with airflow 2.10.0 with microsoft-azure providers 10.3.0 the key shows up in XCOM but it says "No value found for XCom key". So something broke between 2.9.2 / 10.1.2 and 2.10.0/10.3.0.

With all that said, my actual change to put the last line (or all lines) of the logs into XCOM worked. So, I think we need a new defect logged to see why return value isn't showing up correctly any more.

Here is my operator. I'd been testing my changes with do_xcom_push=False since I didn't care about the normal return_value...obscured the fact something else broke :(

  aciOperator = AzureContainerInstancesOperator(
      ci_conn_id="azure-container-instance-conn-id",
      registry_conn_id="acr-conn-id",
      resource_group="redacted",
      name="http2blob{{ ds }}",
      image='redacted',
      region="WestUS2",
      environment_variables={
      redacted
      },
      volumes=[],
      memory_in_gb=1.0,
      cpu=1.0,
      task_id="start-download-aci",
      retries=0,
      do_xcom_push=True,
      # xcom_all=True,
      post_execute=_post_execute,
  )

Also, I did some more testing and multiple_outputs=True also fails back in 2.9.2 with microsoft-azure providers 10.1.2. This actually makes sense because the operator returns a single value, not a dictionary so I think this wasn't actually an issue. So, I'm saying this was tester error unless someone tells me otherwise.

eladkal commented 2 months ago

@perry2of5 there is only 1 question relevant here. Is there regression in apache-airflow-providers-microsoft-azure from 10.4.0rc1 to 10.3.0 ? All the rest is possible bugs that do not affect our decision about releasing.

Please clarify explicitly what worked on 10.3.0 and does not work anymore on 10.4.0rc1

perry2of5 commented 2 months ago

I did not find any regression from 10.3.0 to 10.4.0rc1.

eladkal commented 2 months ago

I did not find any regression from 10.3.0 to 10.4.0rc1.

Then it's not blocking the release. Feel free to raise PR to address the bugs you mentioned

eladkal commented 2 months ago

Thank you everyone. Providers are released. Provider openlineage is excluded and will followup with rc2

I invite everyone to help improve providers for the next release, a list of open issues can be found here.