apache / airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
https://airflow.apache.org/
Apache License 2.0
36.23k stars 14.08k forks source link

Status of testing Providers that were prepared on July 29, 2023 #32936

Closed eladkal closed 1 year ago

eladkal commented 1 year ago

Body

I have a kind request for all the contributors to the latest provider packages release. Could you please help us to test the RC versions of the providers?

Let us know in the comment, whether the issue is addressed.

Those are providers that require testing as there were some substantial changes introduced:

Provider amazon: 8.4.0rc1

The guidelines on how to test providers can be found in

Verify providers by contributors

All users involved in the PRs: @JDarDagran @fdemiane @ieunea1128 @kristopherkane @VladaZakharova @potiuk @Lee-W @hussein-awala @mobuchowski @charliermarsh @dwreeves @iJanki-gr @pankajastro @dashton90 @avinashpandeshwar @ivica-k @o-nikolas @rishi-kulkarni @mahammi @vandonr-amz @michalc @MaksYermak @moiseenkov @syedahsn

Committer

ivica-k commented 1 year ago

Tested the RedshiftDataOperator from the Amazon provider with this DAG:

from airflow import DAG
from airflow.providers.amazon.aws.operators.redshift_data import RedshiftDataOperator

default_args = {
    "start_date": "2023-03-01",
}

dag = DAG(
    "test_redshift_serverless",
    default_args=default_args,
    schedule="0 0 1,15 * *",
    catchup=False
)

try:
    rd = RedshiftDataOperator(
        task_id="run_this",
        dag=dag,
        database="dev",
        workgroup_name="ivica",
        sql="select current_user;",
        aws_conn_id="aws_default",
        wait_for_completion=True,
        return_sql_result=True
    )

    rd

except Exception as msg:
    print(msg)

Query result is:

{
    'ColumnMetadata': [{
        'isCaseSensitive': True,
        'isCurrency': False,
        'isSigned': False,
        'label': 'current_user',
        'length': 0,
        'name': 'current_user',
        'nullable': 1,
        'precision': 63,
        'scale': 0,
        'schemaName': '',
        'tableName': '',
        'typeName': 'bpchar'
    }],
    'Records': [
        [{
            'stringValue': 'IAM:ikolenkas'
        }]
    ],
    'TotalNumRows': 1,
    'ResponseMetadata': {
        'RequestId': '8188e2dc-7129-4614-b630-36707a8a7669',
        'HTTPStatusCode': 200,
        'HTTPHeaders': {
            'x-amzn-requestid': '8188e2dc-7129-4614-b630-36707a8a7669',
            'content-type': 'application/x-amz-json-1.1',
            'content-length': '289',
            'date': 'Sat, 29 Jul 2023 15:05:51 GMT'
        },
        'RetryAttempts': 0
    }
}

To me this looks like a working operator, including the changes I added. Let me know if you expect a more thorough test.

rishi-kulkarni commented 1 year ago

I can confirm that deferrable operators work in combination with assume_role, per the fix I made in #32733. I tested this by using the BatchOperator with deferrable=True with a connection that uses STS to assume a role in another account.

Lee-W commented 1 year ago

We have tested the following providers with our example DAGS without encountering an issue.

apache-airflow-providers-amazon apache-airflow-providers-apache-hive apache-airflow-providers-cncf-kubernetes apache-airflow-providers-databricks apache-airflow-providers-google apache-airflow-providers-snowflake apache-airflow-providers-sftp apache-airflow-providers-microsoft-azure

potiuk commented 1 year ago

Checked that all the changes I added are included - they import and install nicely. I've also run main airflow with the new providers installed from pypi - including running them with the Celery Executor and it seems that they are nicely working together.

I also added PR with instructions showing how easy it is to utilize Breeze features to run such tests - where you can combine for example "2.7.0dev0" airflow built from sources with RC candidates of few selected providers downloaded from PyPI. It's just few commands to execute https://github.com/apache/airflow/pull/32948. Not everyone knows that you can get Airflow up and running (start-airflow with Release Candidate providers downloaded from PyPI in literally less than a minute):

Screenshot 2023-07-30 at 13 00 35
hussein-awala commented 1 year ago

I checked #32768 and it's included in the RC. It's difficult to reproduce the error and test the fix.

JDarDagran commented 1 year ago

I checked my following changes in below providers: apache-airflow-providers-common-sql==1.6.1rc1 apache-airflow-providers-mysql==5.2.0rc1 apache-airflow-providers-openlineage==1.0.0rc1 apache-airflow-providers-postgres==5.6.0rc1 apache-airflow-providers-snowflake==4.4.0rc1

by running example DAGs and seeing OpenLineage events in logs as per design.

mobuchowski commented 1 year ago

I've found one bug for https://github.com/apache/airflow/pull/31350, fixing in: https://github.com/apache/airflow/pull/32956

OpenLineage changes in sftp 4.5.0rc1 work as intended. Will check more tomorrow.

ieunea1128 commented 1 year ago

I've tested #32664 with following connection and there's no error with it.

conn = Connection(
    conn_id="sample_aws_connection",
    conn_type="aws",
    login="access_key",  
    password="secret_key", 
    extra={
        "test_endpoint_url": "https://sts.us-east-1.amazonaws.com",
        "region_name": "us-east-1"
    },
)

Also unit test is also added in #32664 to validate it.

iJanki-gr commented 1 year ago

https://github.com/apache/airflow/pull/32885 works as expected.

potiuk commented 1 year ago

I've found one bug for #31350, fixing in: #32956

OpenLineage changes in sftp 4.5.0rc1 work as intended. Will check more tomorrow.

I guess it's not a blocker @mobuchowski ?

mobuchowski commented 1 year ago

@potiuk no, I believe it's not.

fdemiane commented 1 year ago

32673 works as expected. The GKEStartPodOperator doesn't fail after the one hour mark anymore.

The following parameters were passed to the operator for testing: cmds=['sleep'], arguments=['4000']

eladkal commented 1 year ago

I've found one bug for https://github.com/apache/airflow/pull/31350, fixing in: https://github.com/apache/airflow/pull/32956

This doesn't seem critical and very locallized. I will not cut RC2 for this as this will delay Airflow 2.7.0 release further It will wait for next provider release (we can have another wave next week)

dashton90 commented 1 year ago

32558 works as expected

pankajastro commented 1 year ago

https://github.com/apache/airflow/pull/31591 Looking good. thanks!

moiseenkov commented 1 year ago

31925 is working good

vandonr-amz commented 1 year ago

good for me

moiseenkov commented 1 year ago

Changes for #31471, #31644, #32749 working as expected, however some system tests are failing. We're fixing them.

apache-airflow-providers-amazon==8.4.0rc1
apache-airflow-providers-apache-beam==5.2.0rc1
apache-airflow-providers-celery==3.3.0rc1
apache-airflow-providers-cncf-kubernetes==7.4.0rc1
apache-airflow-providers-google==10.5.0rc1
apache-airflow-providers-postgres==5.6.0rc1
apache-airflow-providers-mysql==5.2.0rc1
apache-airflow-providers-ssh==3.7.1
VladaZakharova commented 1 year ago

Some problems found in https://github.com/apache/airflow/pull/32749: incorrect naming of the resources lead to deleting resources before some action is actually finished. This is fixed in https://github.com/apache/airflow/pull/32855

eladkal commented 1 year ago

Some problems found in https://github.com/apache/airflow/pull/32749: incorrect naming of the resources lead to deleting resources before some action is actually finished.

Can you please clarify what is the regression? From the description it sounds like there is a bug which https://github.com/apache/airflow/pull/32749 tried to solve but didn't completely and https://github.com/apache/airflow/pull/32855 is the full fix. Is this right? If this so.. this will not hold the release as there is no regression... the current RC did not make the problem worst. If this is accurate description I prefer to cut a followup release wave after this one is finished rather than having RC2.

VladaZakharova commented 1 year ago

Some problems found in #32749: incorrect naming of the resources lead to deleting resources before some action is actually finished.

Can you please clarify what is the regression? From the description it sounds like there is a bug which #32749 tried to solve but didn't completely and #32855 is the full fix. Is this right? If this so.. this will not hold the release as there is no regression... the current RC did not make the problem worst. If this is accurate description I prefer to cut a followup release wave after this one is finished rather than having RC2.

Yes, you are right here. Okay then, lets continue with current RC, i would expect https://github.com/apache/airflow/pull/32855 in the next release. Thank you!

eladkal commented 1 year ago

Thank you everyone. Providers are released I invite everyone to help improve providers for the next release, a list of open issues can be found here.