apache / airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
https://airflow.apache.org/
Apache License 2.0
37.13k stars 14.31k forks source link

Google Cloud Transfer operators: bug in transfer service update job operator #22021

Open abhinavraj23 opened 2 years ago

abhinavraj23 commented 2 years ago

Apache Airflow Provider(s)

google

Versions of Apache Airflow Providers

6.4.0

Apache Airflow version

2.1.4

Operating System

linux

Deployment

Composer

Deployment details

No response

What happened

For the airflow operators made corresponding to google cloud storage transfer service, the update transfer job isn't working correctly in case the source is AWS S3, the reason for the same is the TransferJobPreprocessor function. This function injects AWS credentials incase there's an AWS_S3_DATA_SOURCE provided, but the issue with this is that the function expects transfer body and currently the update request body is passed in the function rather than the transfer body, here the documentation link which shows how a request body for updating a transfer job is different than the transfer job object, the request body for update transfer job contains the transfer job in the json object which should be passed in the TransferJobPreprocessor function and after which the AWS credentials would be injected correctly and hence the operator would work as expected.

What you expected to happen

The update transfer job operator should work as expected in case the source is AWS S3.

How to reproduce

Configure a google storage transfer job with S3 being the data source and try to update the transfer in airflow using the UpdateJobOperator. It would throw an error stating AWS credentials are absent in the transfer body.

Anything else

No response

Are you willing to submit PR?

Code of Conduct

potiuk commented 2 years ago

Feel free to fix!

shahar1 commented 1 year ago

@eladkal Please assign me