Closed mrnicegyu11 closed 1 year ago
And another log dump from dalco-prod, from running simcore-service-dynamic-sidecar outputs-push
:
WARNING: [2022-09-23 14:34:40,072/MainProcess] [simcore_sdk.node_ports_common.file_io_utils:logged_gather(122)] - Error in 236-th concurrent task <coroutine object _upload_file_part at 0x7f9d43b744c0>: 404, message='Not Found', url=URL('https://REDACTED/67406d1c-31af-11ec-8033-02420a0b2de3/3e343d70-9469-4224-a834-1d041db1b141/output_1.zip?partNumber=236&uploadId=2~XPikU5lluH2BmbRPTjR1uHn_32ulQCi&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=4FC34BC6KK7TQEBCBHOA/20220923/us-east-1/s3/aws4_request&X-Amz-Date=20220923T143411Z&X-Amz-Expires=21600&X-Amz-SignedHeaders=host&X-Amz-Signature=73d008a646de943b4f81b7e6f4a7fefdbde445b18e3b37472a1e60b33c0701b1')
uploading /tmp/tmpvdio3jln/output_1.zip
WARNING: [2022-09-23 14:34:40,072/MainProcess] [simcore_sdk.node_ports_common.file_io_utils:logged_gather(122)] - Error in 237-th concurrent task <coroutine object _upload_file_part at 0x7f9d43b74540>: 404, message='Not Found', url=URL('https://REDACTED/67406d1c-31af-11ec-8033-02420a0b2de3/3e343d70-9469-4224-a834-1d041db1b141/output_1.zip?partNumber=237&uploadId=2~XPikU5lluH2BmbRPTjR1uHn_32ulQCi&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=4FC34BC6KK7TQEBCBHOA/20220923/us-east-1/s3/aws4_request&X-Amz-Date=20220923T143411Z&X-Amz-Expires=21600&X-Amz-SignedHeaders=host&X-Amz-Signature=95483d1b14916001d6bb1cd2eff910a64104a26042c2082aced4c23c852d53ee')
uploading /tmp/tmpvdio3jln/output_1.zip
WARNING: [2022-09-23 14:34:40,073/MainProcess] [simcore_sdk.node_ports_common.file_io_utils:logged_gather(122)] - Error in 238-th concurrent task <coroutine object _upload_file_part at 0x7f9d43b745c0>: 404, message='Not Found', url=URL('https://REDACTED/67406d1c-31af-11ec-8033-02420a0b2de3/3e343d70-9469-4224-a834-1d041db1b141/output_1.zip?partNumber=238&uploadId=2~XPikU5lluH2BmbRPTjR1uHn_32ulQCi&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=4FC34BC6KK7TQEBCBHOA/20220923/us-east-1/s3/aws4_request&X-Amz-Date=20220923T143411Z&X-Amz-Expires=21600&X-Amz-SignedHeaders=host&X-Amz-Signature=8e7c776287ca473b40ccee7793576dd7f9dfd025749b8c4adeb3db434d28966c')
uploading /tmp/tmpvdio3jln/output_1.zip
WARNING: [2022-09-23 14:34:40,073/MainProcess] [simcore_sdk.node_ports_common.file_io_utils:logged_gather(122)] - Error in 239-th concurrent task <coroutine object _upload_file_part at 0x7f9d43b74640>: 404, message='Not Found', url=URL('https://REDACTED/67406d1c-31af-11ec-8033-02420a0b2de3/3e343d70-9469-4224-a834-1d041db1b141/output_1.zip?partNumber=239&uploadId=2~XPikU5lluH2BmbRPTjR1uHn_32ulQCi&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=4FC34BC6KK7TQEBCBHOA/20220923/us-east-1/s3/aws4_request&X-Amz-Date=20220923T143411Z&X-Amz-Expires=21600&X-Amz-SignedHeaders=host&X-Amz-Signature=7c09e7605fc47b5379b9189346319f2175a258836bc4f1e3fd6574677d1dad95')
uploading /tmp/tmpvdio3jln/output_1.zip
WARNING: [2022-09-23 14:34:40,073/MainProcess] [simcore_sdk.node_ports_common.file_io_utils:logged_gather(122)] - Error in 240-th concurrent task <coroutine object _upload_file_part at 0x7f9d43b746c0>: 404, message='Not Found', url=URL('https://REDACTED/67406d1c-31af-11ec-8033-02420a0b2de3/3e343d70-9469-4224-a834-1d041db1b141/output_1.zip?partNumber=240&uploadId=2~XPikU5lluH2BmbRPTjR1uHn_32ulQCi&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=4FC34BC6KK7TQEBCBHOA/20220923/us-east-1/s3/aws4_request&X-Amz-Date=20220923T143411Z&X-Amz-Expires=21600&X-Amz-SignedHeaders=host&X-Amz-Signature=65dc99a7b29bc16b7c48f13057ef8e0468afdaa1a6df148cb1225fee33e723bd')
uploading /tmp/tmpvdio3jln/output_1.zip
WARNING: [2022-09-23 14:34:40,074/MainProcess] [simcore_sdk.node_ports_common.file_io_utils:logged_gather(122)] - Error in 241-th concurrent task <coroutine object _upload_file_part at 0x7f9d43b74740>: 404, message='Not Found', url=URL('https://REDACTED/67406d1c-31af-11ec-8033-02420a0b2de3/3e343d70-9469-4224-a834-1d041db1b141/output_1.zip?partNumber=241&uploadId=2~XPikU5lluH2BmbRPTjR1uHn_32ulQCi&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=4FC34BC6KK7TQEBCBHOA/20220923/us-east-1/s3/aws4_request&X-Amz-Date=20220923T143411Z&X-Amz-Expires=21600&X-Amz-SignedHeaders=host&X-Amz-Signature=aa57d15a0361a56392c2ba29d5b4f93dbcb0f177eeb07dbc21045e8a6e129a73')
uploading /tmp/tmpvdio3jln/output_1.zip
uploading /tmp/tmpvdio3jln/output_1.zip███████████ | 1.88G/2.35G [00:28<00:10, 47.3Mbyte/s]
: 80%|███████████████████████████████████████████ | 1.88G/2.35G [00:28<00:07, 70.1Mbyte/s]
ERROR: [2022-09-23 14:34:40,074/MainProcess] [simcore_sdk.node_ports_common.filemanager:upload_file(355)] - The upload failed with an unexpected error:
Traceback (most recent call last):
File "/home/scu/.venv/lib/python3.9/site-packages/simcore_sdk/node_ports_common/file_io_utils.py", line 261, in upload_file_to_presigned_links
results = await logged_gather(
File "/home/scu/.venv/lib/python3.9/site-packages/servicelib/utils.py", line 134, in logged_gather
raise error
File "/home/scu/.venv/lib/python3.9/site-packages/servicelib/utils.py", line 113, in sem_task
return await task
File "/home/scu/.venv/lib/python3.9/site-packages/simcore_sdk/node_ports_common/file_io_utils.py", line 177, in _upload_file_part
async for attempt in AsyncRetrying(
File "/home/scu/.venv/lib/python3.9/site-packages/tenacity/_asyncio.py", line 69, in __anext__
do = self.iter(retry_state=self._retry_state)
File "/home/scu/.venv/lib/python3.9/site-packages/tenacity/__init__.py", line 349, in iter
return fut.result()
File "/usr/local/lib/python3.9/concurrent/futures/_base.py", line 439, in result
return self.__get_result()
File "/usr/local/lib/python3.9/concurrent/futures/_base.py", line 391, in __get_result
raise self._exception
File "/home/scu/.venv/lib/python3.9/site-packages/simcore_sdk/node_ports_common/file_io_utils.py", line 193, in _upload_file_part
response.raise_for_status()
File "/home/scu/.venv/lib/python3.9/site-packages/aiohttp/client_reqrep.py", line 1004, in raise_for_status
raise ClientResponseError(
aiohttp.client_exceptions.ClientResponseError: 404, message='Not Found', url=URL('https://REDACTED/67406d1c-31af-11ec-8033-02420a0b2de3/3e343d70-9469-4224-a834-1d041db1b141/output_1.zip?partNumber=193&uploadId=2~XPikU5lluH2BmbRPTjR1uHn_32ulQCi&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=4FC34BC6KK7TQEBCBHOA/20220923/us-east-1/s3/aws4_request&X-Amz-Date=20220923T143411Z&X-Amz-Expires=21600&X-Amz-SignedHeaders=host&X-Amz-Signature=09c63d5ea46d51eb7b94ab32f6848e11f2bd310d4826fcb9eccd2e49c2fcdfb8')
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/scu/.venv/lib/python3.9/site-packages/simcore_sdk/node_ports_common/filemanager.py", line 340, in upload_file
uploaded_parts = await upload_file_to_presigned_links(
File "/home/scu/.venv/lib/python3.9/site-packages/simcore_sdk/node_ports_common/file_io_utils.py", line 278, in upload_file_to_presigned_links
raise exceptions.S3TransferError(
simcore_sdk.node_ports_common.exceptions.S3TransferError: Could not upload file /tmp/tmpvdio3jln/output_1.zip:404, message='Not Found', url=URL('https://REDACTED/67406d1c-31af-11ec-8033-02420a0b2de3/3e343d70-9469-4224-a834-1d041db1b141/output_1.zip?partNumber=193&uploadId=2~XPikU5lluH2BmbRPTjR1uHn_32ulQCi&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=4FC34BC6KK7TQEBCBHOA/20220923/us-east-1/s3/aws4_request&X-Amz-Date=20220923T143411Z&X-Amz-Expires=21600&X-Amz-SignedHeaders=host&X-Amz-Signature=09c63d5ea46d51eb7b94ab32f6848e11f2bd310d4826fcb9eccd2e49c2fcdfb8')
ERROR: [2022-09-23 14:34:40,077/MainProcess] [asyncio:default_exception_handler(1753)] - Future exception was never retrieved
future: <Future finished exception=ClientOSError(1, '[SSL: APPLICATION_DATA_AFTER_CLOSE_NOTIFY] application data after close notify (_ssl.c:2756)')>
aiohttp.client_exceptions.ClientOSError: [Errno 1] [SSL: APPLICATION_DATA_AFTER_CLOSE_NOTIFY] application data after close notify (_ssl.c:2756)
WARNING: [2022-09-23 14:34:40,165/MainProcess] [simcore_sdk.node_ports_common.filemanager:upload_file(359)] - Upload aborted
WARNING: [2022-09-23 14:34:40,166/MainProcess] [servicelib.utils:logged_gather(122)] - Error in 1-th concurrent task <coroutine object Port._set at 0x7f9d43c62040>: Error while transferring to/from S3 storage
Traceback (most recent call last):
File "/home/scu/.venv/lib/python3.9/site-packages/simcore_sdk/node_ports_common/file_io_utils.py", line 261, in upload_file_to_presigned_links
results = await logged_gather(
File "/home/scu/.venv/lib/python3.9/site-packages/servicelib/utils.py", line 134, in logged_gather
raise error
File "/home/scu/.venv/lib/python3.9/site-packages/servicelib/utils.py", line 113, in sem_task
return await task
File "/home/scu/.venv/lib/python3.9/site-packages/simcore_sdk/node_ports_common/file_io_utils.py", line 177, in _upload_file_part
async for attempt in AsyncRetrying(
File "/home/scu/.venv/lib/python3.9/site-packages/tenacity/_asyncio.py", line 69, in __anext__
do = self.iter(retry_state=self._retry_state)
File "/home/scu/.venv/lib/python3.9/site-packages/tenacity/__init__.py", line 349, in iter
return fut.result()
File "/usr/local/lib/python3.9/concurrent/futures/_base.py", line 439, in result
return self.__get_result()
File "/usr/local/lib/python3.9/concurrent/futures/_base.py", line 391, in __get_result
raise self._exception
File "/home/scu/.venv/lib/python3.9/site-packages/simcore_sdk/node_ports_common/file_io_utils.py", line 193, in _upload_file_part
response.raise_for_status()
File "/home/scu/.venv/lib/python3.9/site-packages/aiohttp/client_reqrep.py", line 1004, in raise_for_status
raise ClientResponseError(
aiohttp.client_exceptions.ClientResponseError: 404, message='Not Found', url=URL('https://REDACTED/67406d1c-31af-11ec-8033-02420a0b2de3/3e343d70-9469-4224-a834-1d041db1b141/output_1.zip?partNumber=193&uploadId=2~XPikU5lluH2BmbRPTjR1uHn_32ulQCi&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=4FC34BC6KK7TQEBCBHOA/20220923/us-east-1/s3/aws4_request&X-Amz-Date=20220923T143411Z&X-Amz-Expires=21600&X-Amz-SignedHeaders=host&X-Amz-Signature=09c63d5ea46d51eb7b94ab32f6848e11f2bd310d4826fcb9eccd2e49c2fcdfb8')
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/scu/.venv/lib/python3.9/site-packages/simcore_sdk/node_ports_common/filemanager.py", line 340, in upload_file
uploaded_parts = await upload_file_to_presigned_links(
File "/home/scu/.venv/lib/python3.9/site-packages/simcore_sdk/node_ports_common/file_io_utils.py", line 278, in upload_file_to_presigned_links
raise exceptions.S3TransferError(
simcore_sdk.node_ports_common.exceptions.S3TransferError: Could not upload file /tmp/tmpvdio3jln/output_1.zip:404, message='Not Found', url=URL('https://REDACTED/67406d1c-31af-11ec-8033-02420a0b2de3/3e343d70-9469-4224-a834-1d041db1b141/output_1.zip?partNumber=193&uploadId=2~XPikU5lluH2BmbRPTjR1uHn_32ulQCi&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=4FC34BC6KK7TQEBCBHOA/20220923/us-east-1/s3/aws4_request&X-Amz-Date=20220923T143411Z&X-Amz-Expires=21600&X-Amz-SignedHeaders=host&X-Amz-Signature=09c63d5ea46d51eb7b94ab32f6848e11f2bd310d4826fcb9eccd2e49c2fcdfb8')
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/scu/.venv/bin/simcore-service-dynamic-sidecar", line 8, in <module>
sys.exit(main())
File "/home/scu/.venv/lib/python3.9/site-packages/typer/main.py", line 214, in __call__
return get_command(self)(*args, **kwargs)
File "/home/scu/.venv/lib/python3.9/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/home/scu/.venv/lib/python3.9/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/home/scu/.venv/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/scu/.venv/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/scu/.venv/lib/python3.9/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/home/scu/.venv/lib/python3.9/site-packages/typer/main.py", line 500, in wrapper
return callback(**use_params) # type: ignore
File "/home/scu/.venv/lib/python3.9/site-packages/simcore_service_dynamic_sidecar/cli.py", line 104, in outputs_push
asyncio.run(_async_outputs_push())
File "/usr/local/lib/python3.9/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/usr/local/lib/python3.9/asyncio/base_events.py", line 647, in run_until_complete
return future.result()
File "/home/scu/.venv/lib/python3.9/site-packages/simcore_service_dynamic_sidecar/cli.py", line 100, in _async_outputs_push
await task_ports_outputs_push(
File "/home/scu/.venv/lib/python3.9/site-packages/simcore_service_dynamic_sidecar/modules/long_running_tasks.py", line 299, in task_ports_outputs_push
await nodeports.upload_outputs(
File "/home/scu/.venv/lib/python3.9/site-packages/servicelib/async_utils.py", line 168, in wrapper
raise wrapped_result
File "/home/scu/.venv/lib/python3.9/site-packages/servicelib/async_utils.py", line 149, in worker
result = await awaitable
File "/home/scu/.venv/lib/python3.9/site-packages/simcore_service_dynamic_sidecar/modules/nodeports.py", line 146, in upload_outputs
await PORTS.set_multiple(ports_values)
File "/home/scu/.venv/lib/python3.9/site-packages/simcore_sdk/node_ports_v2/nodeports_v2.py", line 152, in set_multiple
results = await logged_gather(*tasks)
File "/home/scu/.venv/lib/python3.9/site-packages/servicelib/utils.py", line 134, in logged_gather
raise error
File "/home/scu/.venv/lib/python3.9/site-packages/simcore_sdk/node_ports_v2/port.py", line 313, in _set
new_value = await port_utils.push_file_to_store(
File "/home/scu/.venv/lib/python3.9/site-packages/simcore_sdk/node_ports_v2/port_utils.py", line 207, in push_file_to_store
store_id, e_tag = await filemanager.upload_file(
File "/home/scu/.venv/lib/python3.9/site-packages/simcore_sdk/node_ports_common/filemanager.py", line 360, in upload_file
raise exceptions.S3TransferError from exc
simcore_sdk.node_ports_common.exceptions.S3TransferError: Error while transferring to/from S3 storage
This happened again and the following could be seen is storage:
WARNING:simcore_service_storage.simcore_s3_dsm:Dangling multipart uploads '[('2~yKsk1-Jik7aVl2Eh8lo3adnt1ighg30', '29580594-42f9-11ed-b659-02420a0b0018/40f4d20b-5c01-4a5a-8685-8081d4249c65/output_1.zip')]', were aborted. TIP: There were multipart uploads active on S3 with no counter-part in the file_meta_data database. This might indicate that something went wrong in how storage handles multipart uploads!!
Which explains the 404s. Now the question is why did storage remove that dangling upload?
just to make this clear fyi: No "data inconsistency cleanup" took place on dalco (only on master for now), so this can be excluded/
this problem might be a side-effect of the issue fixed by https://github.com/ITISFoundation/osparc-simcore/pull/3462
Let's keep this in observation. For sure a 404 will happen if someone does
We already have fixes for this error. 404 BadRequest is too generic. Please reopen a new issue when it occurs with the extra information the error now adds.
As mentioned before, sometimes we get 404s when uploading. Although I guess I confused ceph and aws. The 404s seem to only happen on CEPH. Here an example from outputs-push.
This is from dalco:
_Issue created from a Mattermost message by @mrnicegyu11._