Closed foldedverse closed 1 year ago
Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval.
I am pretty sure if I pass the project to the TabularDataset and change the code from
TabularDataset(dataset_name=self.dataset_id)
to
TabularDataset(
dataset_name=self.dataset_id,
project_id=self.project_id
)
it would take up the project id but i still feel how can i pass the credentials / gcp_connections object ? If I have that info I can create a PR for the same for all the other training tasks as well, that similarly need to be updated
Line 96: datasets.TimeSeriesDataset(dataset_name=self.dataset_id)
Line 152: datasets.ImageDataset(dataset_name=self.dataset_id)
Line 204: datasets.TabularDataset(dataset_name=self.dataset_id)
Could you try if #31991 can fix your issue?
I tested it worked fine
Apache Airflow version
Other Airflow 2 version (please specify below) : 2.6.1
What happened
I encountered an issue while running an AutoML task. The task failed with an authentication error due to the inability to find the project ID. Here are the details of the error:
What you think should happen instead
Expected Behavior: The AutoML task should execute successfully, using the appropriate project ID and credentials for authentication given as per the gcs_con_id provided in the dag.
Actual Behavior: The task fails with an authentication error due to the inability to find the project ID and default credentials.
How to reproduce
To reproduce the issue and execute the CreateAutoMLTabularTrainingJobOperator task in Apache Airflow, follow these steps:
Ensure that Apache Airflow is installed. If not, run the following command to install it:
Create an instance of the CreateAutoMLTabularTrainingJobOperator within the DAG context:
Start the Apache Airflow scheduler and webserver. Open a terminal or command prompt and run the following commands:
Access the Apache Airflow web UI by opening a web browser and navigating to http://localhost:8080. Ensure that the scheduler and webserver are running without any errors.
Navigate to the DAGs page in the Airflow UI and locate the vi_create_auto_ml_tabular_training_job_dag DAG. Trigger the DAG manually, either by clicking the "Trigger DAG" button or using the Airflow CLI command.
Monitor the DAG execution status and check if the auto_ml_tabular_task completes successfully or encounters any errors.
Operating System
DISTRIB_ID=Ubuntu DISTRIB_RELEASE=20.04 DISTRIB_CODENAME=focal DISTRIB_DESCRIPTION="Ubuntu 20.04 LTS"
Versions of Apache Airflow Providers
$ pip freeze aiofiles==23.1.0 aiohttp==3.8.4 aiosignal==1.3.1 alembic==1.11.1 anyio==3.7.0 apache-airflow==2.6.1 apache-airflow-providers-common-sql==1.5.1 apache-airflow-providers-ftp==3.4.1 apache-airflow-providers-google==10.1.1 apache-airflow-providers-http==4.4.1 apache-airflow-providers-imap==3.2.1 apache-airflow-providers-sqlite==3.4.1 apispec==5.2.2 argcomplete==3.1.1 asgiref==3.7.2 async-timeout==4.0.2 attrs==23.1.0 Babel==2.12.1 backoff==2.2.1 blinker==1.6.2 cachelib==0.9.0 cachetools==5.3.1 cattrs==23.1.2 certifi==2023.5.7 cffi==1.15.1 chardet==5.1.0 charset-normalizer==3.1.0 click==8.1.3 clickclick==20.10.2 colorama==0.4.6 colorlog==4.8.0 ConfigUpdater==3.1.1 connexion==2.14.2 cron-descriptor==1.4.0 croniter==1.4.1 cryptography==41.0.1 db-dtypes==1.1.1 Deprecated==1.2.14 dill==0.3.6 dnspython==2.3.0 docutils==0.20.1 email-validator==1.3.1 exceptiongroup==1.1.1 Flask==2.2.5 Flask-AppBuilder==4.3.0 Flask-Babel==2.0.0 Flask-Caching==2.0.2 Flask-JWT-Extended==4.5.2 Flask-Limiter==3.3.1 Flask-Login==0.6.2 flask-session==0.5.0 Flask-SQLAlchemy==2.5.1 Flask-WTF==1.1.1 frozenlist==1.3.3 future==0.18.3 gcloud-aio-auth==4.2.1 gcloud-aio-bigquery==6.3.0 gcloud-aio-storage==8.2.0 google-ads==21.2.0 google-api-core==2.11.1 google-api-python-client==2.89.0 google-auth==2.20.0 google-auth-httplib2==0.1.0 google-auth-oauthlib==1.0.0 google-cloud-aiplatform==1.26.0 google-cloud-appengine-logging==1.3.0 google-cloud-audit-log==0.2.5 google-cloud-automl==2.11.1 google-cloud-bigquery==3.11.1 google-cloud-bigquery-datatransfer==3.11.1 google-cloud-bigquery-storage==2.20.0 google-cloud-bigtable==2.19.0 google-cloud-build==3.16.0 google-cloud-compute==1.11.0 google-cloud-container==2.24.0 google-cloud-core==2.3.2 google-cloud-datacatalog==3.13.0 google-cloud-dataflow-client==0.8.3 google-cloud-dataform==0.5.1 google-cloud-dataplex==1.5.0 google-cloud-dataproc==5.4.1 google-cloud-dataproc-metastore==1.11.0 google-cloud-dlp==3.12.1 google-cloud-kms==2.17.0 google-cloud-language==2.10.0 google-cloud-logging==3.5.0 google-cloud-memcache==1.7.1 google-cloud-monitoring==2.15.0 google-cloud-orchestration-airflow==1.9.0 google-cloud-os-login==2.9.1 google-cloud-pubsub==2.17.1 google-cloud-redis==2.13.0 google-cloud-resource-manager==1.10.1 google-cloud-secret-manager==2.16.1 google-cloud-spanner==3.36.0 google-cloud-speech==2.20.0 google-cloud-storage==2.9.0 google-cloud-tasks==2.13.1 google-cloud-texttospeech==2.14.1 google-cloud-translate==3.11.1 google-cloud-videointelligence==2.11.2 google-cloud-vision==3.4.2 google-cloud-workflows==1.10.1 google-crc32c==1.5.0 google-resumable-media==2.5.0 googleapis-common-protos==1.59.1 graphviz==0.20.1 greenlet==2.0.2 grpc-google-iam-v1==0.12.6 grpcio==1.54.2 grpcio-gcp==0.2.2 grpcio-status==1.54.2 gunicorn==20.1.0 h11==0.14.0 httpcore==0.17.2 httplib2==0.22.0 httpx==0.24.1 idna==3.4 importlib-metadata==4.13.0 importlib-resources==5.12.0 inflection==0.5.1 itsdangerous==2.1.2 Jinja2==3.1.2 json-merge-patch==0.2 jsonschema==4.17.3 lazy-object-proxy==1.9.0 limits==3.5.0 linkify-it-py==2.0.2 lockfile==0.12.2 looker-sdk==23.10.0 Mako==1.2.4 Markdown==3.4.3 markdown-it-py==3.0.0 MarkupSafe==2.1.3 marshmallow==3.19.0 marshmallow-enum==1.5.1 marshmallow-oneofschema==3.0.1 marshmallow-sqlalchemy==0.26.1 mdit-py-plugins==0.4.0 mdurl==0.1.2 multidict==6.0.4 numpy==1.24.3 oauthlib==3.2.2 ordered-set==4.1.0 packaging==23.1 pandas==2.0.2 pandas-gbq==0.19.2 pathspec==0.9.0 pendulum==2.1.2 pkgutil-resolve-name==1.3.10 pluggy==1.0.0 prison==0.2.1 proto-plus==1.22.2 protobuf==4.23.3 psutil==5.9.5 pyarrow==12.0.1 pyasn1==0.4.8 pyasn1-modules==0.2.8 pycparser==2.21 pydantic==1.10.9 pydata-google-auth==1.8.0 Pygments==2.15.1 PyJWT==2.7.0 pyOpenSSL==23.2.0 pyparsing==3.0.9 pyrsistent==0.19.3 python-daemon==3.0.1 python-dateutil==2.8.2 python-nvd3==0.15.0 python-slugify==8.0.1 pytz==2023.3 pytzdata==2020.1 PyYAML==6.0 requests==2.31.0 requests-oauthlib==1.3.1 requests-toolbelt==1.0.0 rfc3339-validator==0.1.4 rich==13.4.2 rich-argparse==1.1.1 rsa==4.9 setproctitle==1.3.2 Shapely==1.8.5.post1 six==1.16.0 sniffio==1.3.0 SQLAlchemy==1.4.48 sqlalchemy-bigquery==1.6.1 SQLAlchemy-JSONField==1.0.1.post0 SQLAlchemy-Utils==0.41.1 sqlparse==0.4.4 tabulate==0.9.0 tenacity==8.2.2 termcolor==2.3.0 text-unidecode==1.3 typing-extensions==4.6.3 tzdata==2023.3 uc-micro-py==1.0.2 unicodecsv==0.14.1 uritemplate==4.1.1 urllib3==2.0.3 Werkzeug==2.3.6 wrapt==1.15.0 WTForms==3.0.1 yarl==1.9.2 zipp==3.15.0
Deployment
Virtualenv installation
Deployment details
$ airflow info
Apache Airflow version | 2.6.1 executor | SequentialExecutor task_logging_handler | airflow.utils.log.file_task_handler.FileTaskHandler sql_alchemy_conn | sqlite:////home/test1/airflow/airflow.db dags_folder | /mnt/d/projects/airflow/dags plugins_folder | /home/test1/airflow/plugins
base_log_folder | /mnt/d/projects/airflow/logs remote_base_log_folder |
System info OS | Linux architecture | x86_64 uname | uname_result(system='Linux', node='DESKTOP-EIFUHU2', release='4.4.0-19041-Microsoft', version='#1237-Microsoft Sat Sep 11 14:32:00 PST
| 2021', machine='x86_64', processor='x86_64') locale | ('en_US', 'UTF-8') python_version | 3.8.10 (default, May 26 2023, 14:05:08) [GCC 9.4.0] python_location | /mnt/d/projects/tvenv/bin/python3
Tools info git | git version 2.25.1 ssh | OpenSSH_8.2p1 Ubuntu-4, OpenSSL 1.1.1f 31 Mar 2020 kubectl | NOT AVAILABLE gcloud | NOT AVAILABLE cloud_sql_proxy | NOT AVAILABLE mysql | NOT AVAILABLE
sqlite3 | NOT AVAILABLE psql | NOT AVAILABLE
Paths info airflow_home | /home/test1/airflow system_path | /mnt/d/projects/tvenv/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/mnt/c/Program Files
| (x86)/Microsoft SDKs/Azure/CLI2/wbin:/mnt/c/Program Files/Python39/Scripts/:/mnt/c/Program | Files/Python39/:/mnt/c/Windows/system32:/mnt/c/Windows:/mnt/c/Windows/System32/Wbem:/mnt/c/Windows/System32/WindowsPowerShell/v1.0/:/mnt/c/W | indows/System32/OpenSSH/:/mnt/c/Users/ibrez/AppData/Roaming/nvm:/mnt/c/Program Files/nodejs:/mnt/c/Program | Files/dotnet/:/mnt/c/Windows/system32/config/systemprofile/AppData/Local/Microsoft/WindowsApps:/mnt/c/Users/test1/AppData/Local/Microsoft/Wi | ndowsApps:/snap/bin python_path | /mnt/d/projects/tvenv/bin:/usr/lib/python38.zip:/usr/lib/python3.8:/usr/lib/python3.8/lib-dynload:/mnt/d/projects/tvenv/lib/python3.8/site-p | ackages:/mnt/d/projects/airflow/dags:/home/test1/airflow/config:/home/test1/airflow/plugins airflow_on_path | True
Providers info apache-airflow-providers-common-sql | 1.5.1 apache-airflow-providers-ftp | 3.4.1 apache-airflow-providers-google | 10.1.1 apache-airflow-providers-http | 4.4.1 apache-airflow-providers-imap | 3.2.1 apache-airflow-providers-sqlite | 3.4.1
Anything else
To me it seems like the issue is at the line
here
The details like project id and credentials are not being passed to the TabularDataset class which causes issues down the like for
Are you willing to submit PR?
Code of Conduct