[X] I used the GitHub search to find a similar issue and didn't find it.
[X] I searched the Prefect documentation for this issue.
[X] I checked that this issue is related to Prefect and not one of its dependencies.
Bug summary
While a flow run was executing, Prefect logged a large stack trace in the UI, which did not appear in the Cloud Run Job logs. It essentially says there was an EOF error, and something may have gone wrong, but the flow run is not being canceled. The flow run continued executing as there is no error on the Google side, but the Prefect job will continue to show as Running until I cancel it manually.
Reproduction
This has happened to multiple flow runs and is not limited to any deployment in particular.
Error
An error occurred while monitoring flow run '604acb03-918c-4103-bf6f-2d39fcc85617'. The flow run will not be marked as failed, but an issue may have occurred.
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/prefect/workers/base.py", line 908, in _submit_run_and_capture_errors
result = await self.run(
File "/usr/local/lib/python3.10/site-packages/prefect_gcp/workers/cloud_run_v2.py", line 460, in run
result = await run_sync_in_worker_thread(
File "/usr/local/lib/python3.10/site-packages/prefect/utilities/asyncutils.py", line 136, in run_sync_in_worker_thread
return await anyio.to_thread.run_sync(
File "/usr/local/lib/python3.10/site-packages/anyio/to_thread.py", line 33, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/usr/local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
return await future
File "/usr/local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 807, in run
result = context.run(func, *args)
File "/usr/local/lib/python3.10/site-packages/prefect_gcp/workers/cloud_run_v2.py", line 731, in _watch_job_execution_and_get_result
execution = self._watch_job_execution(
File "/usr/local/lib/python3.10/site-packages/prefect_gcp/workers/cloud_run_v2.py", line 805, in _watch_job_execution
execution = ExecutionV2.get(
File "/usr/local/lib/python3.10/site-packages/prefect_gcp/models/cloud_run_v2.py", line 361, in get
response = request.execute()
File "/usr/local/lib/python3.10/site-packages/googleapiclient/_helpers.py", line 130, in positional_wrapper
return wrapped(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/googleapiclient/http.py", line 923, in execute
resp, content = _retry_request(
File "/usr/local/lib/python3.10/site-packages/googleapiclient/http.py", line 222, in _retry_request
raise exception
File "/usr/local/lib/python3.10/site-packages/googleapiclient/http.py", line 191, in _retry_request
resp, content = http.request(uri, method, *args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/google_auth_httplib2.py", line 209, in request
self.credentials.before_request(self._request, method, uri, request_headers)
File "/usr/local/lib/python3.10/site-packages/google/auth/credentials.py", line 230, in before_request
self._blocking_refresh(request)
File "/usr/local/lib/python3.10/site-packages/google/auth/credentials.py", line 193, in _blocking_refresh
self.refresh(request)
File "/usr/local/lib/python3.10/site-packages/google/oauth2/service_account.py", line 445, in refresh
access_token, expiry, _ = _client.jwt_grant(
File "/usr/local/lib/python3.10/site-packages/google/oauth2/_client.py", line 308, in jwt_grant
response_data = _token_endpoint_request(
File "/usr/local/lib/python3.10/site-packages/google/oauth2/_client.py", line 268, in _token_endpoint_request
response_status_ok, response_data, retryable_error = _token_endpoint_request_no_throw(
File "/usr/local/lib/python3.10/site-packages/google/oauth2/_client.py", line 215, in _token_endpoint_request_no_throw
request_succeeded, response_data, retryable_error = _perform_request()
File "/usr/local/lib/python3.10/site-packages/google/oauth2/_client.py", line 191, in _perform_request
response = request(
File "/usr/local/lib/python3.10/site-packages/google_auth_httplib2.py", line 119, in __call__
response, data = self.http.request(
File "/usr/local/lib/python3.10/site-packages/httplib2/__init__.py", line 1724, in request
(response, content) = self._request(
File "/usr/local/lib/python3.10/site-packages/httplib2/__init__.py", line 1444, in _request
(response, content) = self._conn_request(conn, request_uri, method, body, headers)
File "/usr/local/lib/python3.10/site-packages/httplib2/__init__.py", line 1367, in _conn_request
conn.request(method, request_uri, body, headers)
File "/usr/local/lib/python3.10/http/client.py", line 1283, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/usr/local/lib/python3.10/http/client.py", line 1329, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "/usr/local/lib/python3.10/http/client.py", line 1278, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/usr/local/lib/python3.10/http/client.py", line 1038, in _send_output
self.send(msg)
File "/usr/local/lib/python3.10/http/client.py", line 999, in send
self.sock.sendall(data)
File "/usr/local/lib/python3.10/ssl.py", line 1270, in sendall
v = self.send(byte_view[count:])
File "/usr/local/lib/python3.10/ssl.py", line 1239, in send
return self._sslobj.write(data)
ssl.SSLEOFError: EOF occurred in violation of protocol (_ssl.c:2426)
03:35:11 PM
prefect.flow_runs.worker
Encountered an exception while waiting for job run completion - EOF occurred in violation of protocol (_ssl.c:2426)
Versions
Version: 2.16.8
API version: 0.8.4
Python version: 3.11.7
Git commit: 11cb641c
Built: Fri, Mar 29, 2024 11:01 AM
OS/Arch: darwin/x86_64
Profile: default
Server type: cloud
Additional context
This happens after around 1 hour of running, typically.
First check
Bug summary
While a flow run was executing, Prefect logged a large stack trace in the UI, which did not appear in the Cloud Run Job logs. It essentially says there was an EOF error, and something may have gone wrong, but the flow run is not being canceled. The flow run continued executing as there is no error on the Google side, but the Prefect job will continue to show as Running until I cancel it manually.
Reproduction
Error
Versions
Additional context
This happens after around 1 hour of running, typically.