and try initiative browser Google sign-in which must fail. I don't understand why the authentication flow ends in that execution branch since the credential of type authorized_userexist in the well-known path ~/.config/gcloud/application_default_credentials.json.
[2024-09-21, 21:05:58 UTC] {taskinstance.py:1328} INFO - Executing <Task(BeamRunPythonPipelineOperator): submit_beam_job> on 2022-01-01 00:00:00+00:00
[2024-09-21, 21:05:58 UTC] {standard_task_runner.py:57} INFO - Started process 2105 to run task
[2024-09-21, 21:05:58 UTC] {standard_task_runner.py:84} INFO - Running: ['airflow', 'tasks', 'run', 'dag-example-beam-dataflow-python', 'submit_beam_job', 'scheduled__2022-01-01T00:00:00+00:00', '--job-id', '716', '--raw', '--subdir', 'DAGS_FOLDER/dag_example_beam_dataflow_python/dag_example_beam_dataflow_python.py', '--cfg-path', '/tmp/tmprcjz83eb']
[2024-09-21, 21:05:58 UTC] {standard_task_runner.py:85} INFO - Job 716: Subtask submit_beam_job
[2024-09-21, 21:05:58 UTC] {task_command.py:414} INFO - Running <TaskInstance: dag-example-beam-dataflow-python.submit_beam_job scheduled__2022-01-01T00:00:00+00:00 [running]> on host 9743bdb39e14
[2024-09-21, 21:05:58 UTC] {taskinstance.py:1547} INFO - Exporting env vars: AIRFLOW_CTX_DAG_OWNER='airflow' AIRFLOW_CTX_DAG_ID='dag-example-beam-dataflow-python' AIRFLOW_CTX_TASK_ID='submit_beam_job' AIRFLOW_CTX_EXECUTION_DATE='2022-01-01T00:00:00+00:00' AIRFLOW_CTX_TRY_NUMBER='3' AIRFLOW_CTX_DAG_RUN_ID='scheduled__2022-01-01T00:00:00+00:00'
[2024-09-21, 21:05:58 UTC] {crypto.py:83} WARNING - empty cryptography key - values will not be stored encrypted.
[2024-09-21, 21:05:58 UTC] {base.py:73} INFO - Using connection ID 'google_cloud_default' for task execution.
[2024-09-21, 21:05:58 UTC] {beam.py:198} INFO - {'job_name': 'simple-beam-job-7d346a20', 'project': 'XXX', 'region': 'europe-west1', 'labels': {'airflow-version': 'v2-6-3-composer'}}
[2024-09-21, 21:05:59 UTC] {base.py:73} INFO - Using connection ID 'google_cloud_default' for task execution.
[2024-09-21, 21:05:59 UTC] {credentials_provider.py:353} INFO - Getting connection using `google.auth.default()` since no explicit credentials are provided.
[2024-09-21, 21:05:59 UTC] {logging_mixin.py:150} WARNING - /opt/python3.8/lib/python3.8/site-packages/google/auth/_default.py:78 UserWarning: Your application has authenticated using end user credentials from Google Cloud SDK without a quota project. You might receive a "quota exceeded" or "API not enabled" error. See the following page for troubleshooting: https://cloud.google.com/docs/authentication/adc-troubleshooting/user-creds.
[2024-09-21, 21:06:05 UTC] {beam.py:271} INFO - Beam version: 2.50.0
[2024-09-21, 21:06:05 UTC] {beam.py:131} INFO - Running command: python3 /home/airflow/gcs/dags/dag_example_beam_dataflow_python/src/beam/job_example_beam_dataflow_python.py --runner=DataflowRunner --job_name=simple-beam-job-7d346a20 --project=XXX --region=europe-west1 --labels=airflow-version=v2-6-3-composer --worker_machine_type=n1-standard-1 --disk_size_gb=10 --num_workers=1
[2024-09-21, 21:06:05 UTC] {beam.py:142} INFO - Start waiting for Apache Beam process to complete.
[2024-09-21, 21:06:07 UTC] {beam.py:113} INFO - 0
[2024-09-21, 21:06:10 UTC] {beam.py:113} INFO - Generating new OAuth credentials ...
[2024-09-21, 21:06:10 UTC] {beam.py:113} INFO -
[2024-09-21, 21:06:10 UTC] {beam.py:113} INFO - Your browser has been opened to visit:
[2024-09-21, 21:06:10 UTC] {beam.py:113} INFO -
[2024-09-21, 21:06:10 UTC] {beam.py:113} INFO - https://accounts.google.com/o/oauth2/v2/auth?client_id=XXXX.apps.googleusercontent.com&redirect_uri=http%3A%2F%2Flocalhost%3A8090%2F&scope=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fcloud-platform+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fcompute+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fcompute.readonly+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fuserinfo.email&access_type=offline&response_type=code
[2024-09-21, 21:06:10 UTC] {beam.py:113} INFO -
[2024-09-21, 21:06:10 UTC] {beam.py:113} INFO - If your browser is on a different machine then exit and re-run this
[2024-09-21, 21:06:10 UTC] {beam.py:113} INFO - application with the command-line parameter
[2024-09-21, 21:06:10 UTC] {beam.py:113} INFO -
[2024-09-21, 21:06:10 UTC] {beam.py:113} INFO - --noauth_local_webserver
[2024-09-21, 21:06:10 UTC] {beam.py:113} INFO -
What you think should happen instead?
Airflow should submit job to Dataflow using Application Default Credentials the same way standalone Apache Beam Python (without Airflow) submits the job to Dataflow does.
Hi @fpopic , as you checked "Yes I am willing to submit a PR!", I'll assign this to you. But please let us know if you no longer interested in it. Thanks!
Apache Airflow version
Other Airflow 2 version (please specify below)
If "Other Airflow 2 version" selected, which one?
2.6.3 (problem occurs in latest version as well, will try to download latest and post log as well)
What happened?
When manually submitting an Apache Beam Python job to Google Dataflow runner using BeamRunPythonPipelineOperator
BeamRunPythonPipelineOperator.pipeline_options.service_account_name
GOOGLE_APPLICATION_CREDENTIALS
GCP_PROJECT
gcloud auth application-default login --disable-quota-project
gcloud auth login
gcloud config set project <project>
google_cloud_default
with following contentBeamRunPythonPipelineOperator.DataflowConfiguration.project_id
Task gets stuck in
apitools
that use oauth2clientand try initiative browser Google sign-in which must fail. I don't understand why the authentication flow ends in that execution branch since the credential of type
authorized_user
exist in the well-known path~/.config/gcloud/application_default_credentials.json
.What you think should happen instead?
Airflow should submit job to Dataflow using Application Default Credentials the same way standalone Apache Beam Python (without Airflow) submits the job to Dataflow does.
I see that Apache Beam already solved that problem https://github.com/apache/beam/pull/15004, hence running Apache Beam Python without Airflow using ADC works.
How to reproduce
Prepare env. variables:
and execute the following DAG
and Apache Beam job source code
job_example_beam_dataflow_python_adc.py
Operating System
macOS 14.7
Versions of Apache Airflow Providers
apache-airflow-providers-apache-beam==5.3.0
Deployment
Docker Airflow / Composer Airflow (doesn't matter, problem occurs in latest version as well).
Deployment details
Anything else?
No response
Are you willing to submit PR?
Code of Conduct