Logs do not show up for the steps that ran on GCP.
I've traced it to gs_tail.py. It uses the same blob object (self._blob_client) to get the latest logs. However, after this object is initialised, its generation field is set to the latest generation value of the file. The generation value becomes invalid and the code raises a NotFound error when the file is updated.
404 GET https://storage.googleapis.com/download/storage/v1/b/testbucket/o/tf-full-stack-sysroot%2FSimpleTestFlow%2F18%2Frun_on_cpu_remote%2F274258%2F0.task_stdout.log?alt=media&generation=1709809091417399: No such object: testbucket/tf-full-stack-sysroot/SimpleTestFlow/18/run_on_cpu_remote/274258/0.task_stdout.log: ('Request failed with status code', 404, 'Expected one of', <HTTPStatus.OK: 200>, <HTTPStatus.PARTIAL_CONTENT: 206>)
It can be tested like below:
>>> b = storage.Client().bucket("testbucket").blob("hello.json")
>>> b.download_as_bytes()
b'{\n "text": "Hello from the file in the bucket"\n}'
re-uploaded the same file again here / overwritten
>>> b.download_as_bytes()
Traceback (most recent call last):
File "/Users/erdememekligil/miniconda3/envs/gcp-metaflow/lib/python3.11/site-packages/google/cloud/storage/client.py", line 1151, in download_blob_to_file
blob_or_uri._do_download(
File "/Users/erdememekligil/miniconda3/envs/gcp-metaflow/lib/python3.11/site-packages/google/cloud/storage/blob.py", line 989, in _do_download
response = download.consume(transport, timeout=timeout)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/erdememekligil/miniconda3/envs/gcp-metaflow/lib/python3.11/site-packages/google/resumable_media/requests/download.py", line 237, in consume
return _request_helpers.wait_and_retry(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/erdememekligil/miniconda3/envs/gcp-metaflow/lib/python3.11/site-packages/google/resumable_media/requests/_request_helpers.py", line 155, in wait_and_retry
response = func()
^^^^^^
File "/Users/erdememekligil/miniconda3/envs/gcp-metaflow/lib/python3.11/site-packages/google/resumable_media/requests/download.py", line 219, in retriable_request
self._process_response(result)
File "/Users/erdememekligil/miniconda3/envs/gcp-metaflow/lib/python3.11/site-packages/google/resumable_media/_download.py", line 188, in _process_response
_helpers.require_status_code(
File "/Users/erdememekligil/miniconda3/envs/gcp-metaflow/lib/python3.11/site-packages/google/resumable_media/_helpers.py", line 108, in require_status_code
raise common.InvalidResponse(
google.resumable_media.common.InvalidResponse: ('Request failed with status code', 404, 'Expected one of', <HTTPStatus.OK: 200>, <HTTPStatus.PARTIAL_CONTENT: 206>)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/erdememekligil/miniconda3/envs/gcp-metaflow/lib/python3.11/site-packages/google/cloud/storage/blob.py", line 1401, in download_as_bytes
client.download_blob_to_file(
File "/Users/erdememekligil/miniconda3/envs/gcp-metaflow/lib/python3.11/site-packages/google/cloud/storage/client.py", line 1164, in download_blob_to_file
_raise_from_invalid_response(exc)
File "/Users/erdememekligil/miniconda3/envs/gcp-metaflow/lib/python3.11/site-packages/google/cloud/storage/blob.py", line 4457, in _raise_from_invalid_response
raise exceptions.from_http_status(response.status_code, message, response=response)
google.api_core.exceptions.NotFound: 404 GET https://storage.googleapis.com/download/storage/v1/b/testbucket/o/hello.json?alt=media&generation=1709894045845102: No such object: testbucket/hello.json: ('Request failed with status code', 404, 'Expected one of', <HTTPStatus.OK: 200>, <HTTPStatus.PARTIAL_CONTENT: 206>)
I'm not sure why this error hadn't happened until now, I've also tested it with older versions of Google components:
Logs do not show up for the steps that ran on GCP.
I've traced it to gs_tail.py. It uses the same blob object (self._blob_client) to get the latest logs. However, after this object is initialised, its generation field is set to the latest generation value of the file. The generation value becomes invalid and the code raises a NotFound error when the file is updated.
https://github.com/Netflix/metaflow/blob/cbf9b7f198bf2f1e255e0dda5c47324b63cc8bd3/metaflow/plugins/gcp/gs_tail.py#L49
404 GET https://storage.googleapis.com/download/storage/v1/b/testbucket/o/tf-full-stack-sysroot%2FSimpleTestFlow%2F18%2Frun_on_cpu_remote%2F274258%2F0.task_stdout.log?alt=media&generation=1709809091417399: No such object: testbucket/tf-full-stack-sysroot/SimpleTestFlow/18/run_on_cpu_remote/274258/0.task_stdout.log: ('Request failed with status code', 404, 'Expected one of', <HTTPStatus.OK: 200>, <HTTPStatus.PARTIAL_CONTENT: 206>)
It can be tested like below:
I'm not sure why this error hadn't happened until now, I've also tested it with older versions of Google components:
setup
Older setup