GoogleCloudPlatform / training-data-analyst

Labs and demos for courses for GCP Training (http://cloud.google.com/training).
Apache License 2.0
7.64k stars 5.77k forks source link

Fix error when './data' directory already exists #2623

Closed yrribeiro closed 1 month ago

yrribeiro commented 1 month ago

Previously, the script attempted to create the './data' directory without checking its existence, leading to an Errno 17 File exists error. This error would prevent the data download script from proceeding further and since no data has been downloaded to the bucket, it throws another error when BigQuery tries to load and transform data ([ERROR] An exception occurred: 404 Not found: URI gs://PROJECT_ID-bucket/data/online_retail.csv;)).

Full error traceback:

2024-05-20 13:06:22,424 [ERROR] An exception occurred: [Errno 17] File exists: './data'

 2024-05-20 13:06:22,433 [INFO] Initializing BigQuery dataset.

 2024-05-20 13:06:22,586 [WARNING] Dataset online_retail already exists, not creating.

 2024-05-20 13:06:23,247 [INFO] BQ raw dataset load job starting...

 2024-05-20 13:06:24,034 [ERROR] An exception occurred: 404 Not found: URI gs://qwiklabs-gcp-00-3f0cc69e5b28-bucket/data/online_retail.csv; reason: notFound, message: Not found: URI gs://qwiklabs-gcp-00-3f0cc69e5b28-bucket/data/online_retail.csv
Traceback (most recent call last):
  File "/home/jupyter/training-data-analyst/self-paced-labs/vertex-ai/vertex-ai-qwikstart/utils/data_download.py", line 186, in <module>
    upload_gcs2bq(args, table_schema)
  File "/home/jupyter/training-data-analyst/self-paced-labs/vertex-ai/vertex-ai-qwikstart/utils/data_download.py", line 116, in upload_gcs2bq
    destination_table = client.get_table(RAW_TABLE_ID)  # Make an API request.
  File "/opt/conda/lib/python3.9/site-packages/google/cloud/bigquery/client.py", line 1079, in get_table
    api_response = self._call_api(
  File "/opt/conda/lib/python3.9/site-packages/google/cloud/bigquery/client.py", line 827, in _call_api
    return call()
  File "/opt/conda/lib/python3.9/site-packages/google/api_core/retry.py", line 349, in retry_wrapped_func
    return retry_target(
  File "/opt/conda/lib/python3.9/site-packages/google/api_core/retry.py", line 191, in retry_target
    return target()
  File "/opt/conda/lib/python3.9/site-packages/google/cloud/_http/__init__.py", line 494, in api_request
    raise exceptions.from_http_response(response)
google.api_core.exceptions.NotFound: 404 GET https://bigquery.googleapis.com/bigquery/v2/projects/qwiklabs-gcp-00-3f0cc69e5b28/datasets/online_retail/tables/online_retail_clv_raw?prettyPrint=false: Not found: Table qwiklabs-gcp-00-3f0cc69e5b28:online_retail.online_retail_clv_raw
google-cla[bot] commented 1 month ago

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.