cal-itp / data-infra

Cal-ITP data infrastructure
https://docs.calitp.org/data-infra
GNU Affero General Public License v3.0
47 stars 12 forks source link

Bug: gtfs_downloader.validate_gcs_bucket sometimes fails when uploading/downloading data #1078

Closed evansiroky closed 2 years ago

evansiroky commented 2 years ago

Describe the bug

It appears that the gtfs_downloader.validate_gcs bucket sometimes fails to upload or maybe download data which fails the DAG task and a whole bunch of other tasks after that.

To Reproduce

See logs:

2022-02-09 try 1:

2022-02-11 00:51:36,296] {pod_launcher.py:159} INFO - aiohttp.client_exceptions.ClientResponseError: 503, message='Service Unavailable', url=URL('https://www.googleapis.com/download/storage/v1/b/gtfs-data/o/schedule%2F2022-02-10T00:00:00+00:00%2F238_0%2Fcalendar_dates.txt?alt=media')) [20

2022-02-09 try 2:

[2022-02-11 02:16:23,313] {pod_launcher.py:159} INFO - aiohttp.client_exceptions.ClientOSError: [Errno 32] Broken pipe

2022-02-12 try 1:

[2022-02-14 02:04:44,679] {pod_launcher.py:159} INFO - aiohttp.client_exceptions.ClientOSError: [Errno 32] Broken pipe

2022-02-12 try 2:

[2022-02-14 04:38:01,195] {pod_launcher.py:159} INFO - aiohttp.client_exceptions.ClientOSError: [Errno 32] Broken pipe

Expected behavior

The script should be able to gracefully handle file upload/download exceptions with automatic retries.

atvaccaro commented 2 years ago

Working on this in https://github.com/cal-itp/gtfs-validator-api but blocked pending permissions on the repo

atvaccaro commented 2 years ago

https://github.com/cal-itp/gtfs-validator-api/pull/3 is open