ITISFoundation / osparc-issues

🐼 issue-only repo for the osparc project
3 stars 5 forks source link

Study hangs while closing / Service pending forever / Problem with pushing outputs? #1161

Closed elisabettai closed 1 year ago

elisabettai commented 1 year ago

Long Story Short I am trying to close a study on osparc.io where I'm the owner (study uuid: a4628a32-3419-11ed-9a85-02420a0b01c2). One of the node fails while uploading the outputs. The study is current locked because I'm closing it. I suspect that at some point it will unlock, but the service will fail to start.

Additional context The node uuid of the failing node is 5af4ae77-52ff-4036-addb-61f985bdcce0 This are the errors in the dy-sidecar:

log_level=WARNING | log_timestamp=2023-10-16 14:51:13,469 | log_source=servicelib.utils:logged_gather(127) | log_uid=None | log_msg=Error in 1-th concurrent task <coroutine object Port._set at 0x7f5d018745f0>: 503, message='Unexpected error while accessing S3 backend: Unexpected error while accessing S3 backend', url=URL('http://production_storage:8080/v0/locations/0/files/a4628a32-3419-11ed-9a85-02420a0b01c2%252F5af4ae77-52ff-4036-addb-61f985bdcce0%252Foutputs%252Foutput_1%252Foutput_1.zip:complete/futures/upload_complete_task_98561_a4628a32-3419-11ed-9a85-02420a0b01c2%252F5af4ae77-52ff-4036-addb-61f985bdcce0%252Foutputs%252Foutput_1%252Foutput_1.zip?user_id=98561')

log_level=INFO | log_timestamp=2023-10-16 14:51:13,476 | log_source=uvicorn.access:send(478) | log_uid=None | log_msg=172.13.246.244:44538 - "GET /v1/containers?only_status=false HTTP/1.1" 200

log_level=WARNING | log_timestamp=2023-10-16 14:51:13,565 | log_source=simcore_service_dynamic_sidecar.modules.outputs._manager:_remove_downloads(151) | log_uid=None | log_msg=outputs_manager_port_keys-output_1 ended with exception: 503, message='Unexpected error while accessing S3 backend: Unexpected error while accessing S3 backend', url=URL('http://production_storage:8080/v0/locations/0/files/a4628a32-3419-11ed-9a85-02420a0b01c2%252F5af4ae77-52ff-4036-addb-61f985bdcce0%252Foutputs%252Foutput_1%252Foutput_1.zip:complete/futures/upload_complete_task_98561_a4628a32-3419-11ed-9a85-02420a0b01c2%252F5af4ae77-52ff-4036-addb-61f985bdcce0%252Foutputs%252Foutput_1%252Foutput_1.zip?user_id=98561')
Traceback (most recent call last):
  File "/home/scu/.venv/lib/python3.10/site-packages/simcore_service_dynamic_sidecar/modules/outputs/_manager.py", line 133, in _upload_ports
    await upload_outputs(
  File "/home/scu/.venv/lib/python3.10/site-packages/simcore_service_dynamic_sidecar/modules/nodeports.py", line 173, in upload_outputs
    await PORTS.set_multiple(ports_values, progress_bar=sub_progress)
  File "/home/scu/.venv/lib/python3.10/site-packages/simcore_sdk/node_ports_v2/nodeports_v2.py", line 171, in set_multiple
    results = await logged_gather(*tasks)
  File "/home/scu/.venv/lib/python3.10/site-packages/servicelib/utils.py", line 139, in logged_gather
    raise error
  File "/home/scu/.venv/lib/python3.10/site-packages/simcore_sdk/node_ports_v2/port.py", line 330, in _set
    new_value = await port_utils.push_file_to_store(
  File "/home/scu/.venv/lib/python3.10/site-packages/simcore_sdk/node_ports_v2/port_utils.py", line 240, in push_file_to_store
    upload_result: UploadedFolder | UploadedFile = await filemanager.upload_path(
  File "/home/scu/.venv/lib/python3.10/site-packages/simcore_sdk/node_ports_common/filemanager.py", line 312, in upload_path
    store_id, e_tag, upload_links = await _upload_to_s3(
  File "/home/scu/.venv/lib/python3.10/site-packages/simcore_sdk/node_ports_common/filemanager.py", line 398, in _upload_to_s3
    e_tag = await _complete_upload(
  File "/home/scu/.venv/lib/python3.10/site-packages/simcore_sdk/node_ports_common/_filemanager.py", line 73, in _complete_upload
    async for attempt in AsyncRetrying(
  File "/home/scu/.venv/lib/python3.10/site-packages/tenacity/_asyncio.py", line 71, in __anext__
    do = self.iter(retry_state=self._retry_state)
  File "/home/scu/.venv/lib/python3.10/site-packages/tenacity/__init__.py", line 314, in iter
    return fut.result()
  File "/usr/local/lib/python3.10/concurrent/futures/_base.py", line 451, in result
    return self.__get_result()
  File "/usr/local/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "/home/scu/.venv/lib/python3.10/site-packages/simcore_sdk/node_ports_common/_filemanager.py", line 84, in _complete_upload
    resp.raise_for_status()
  File "/home/scu/.venv/lib/python3.10/site-packages/aiohttp/client_reqrep.py", line 1005, in raise_for_status
    raise ClientResponseError(
aiohttp.client_exceptions.ClientResponseError: 503, message='Unexpected error while accessing S3 backend: Unexpected error while accessing S3 backend', url=URL('http://production_storage:8080/v0/locations/0/files/a4628a32-3419-11ed-9a85-02420a0b01c2%252F5af4ae77-52ff-4036-addb-61f985bdcce0%252Foutputs%252Foutput_1%252Foutput_1.zip:complete/futures/upload_complete_task_98561_a4628a32-3419-11ed-9a85-02420a0b01c2%252F5af4ae77-52ff-4036-addb-61f985bdcce0%252Foutputs%252Foutput_1%252Foutput_1.zip?user_id=98561')
elisabettai commented 1 year ago

@GitHK, could you please have a look (see error above)? on osparc.io. the sidecar of service 5af4ae77-52ff-4036-addb-61f985bdcce0 is still there and the study has unlocked itself, as predicted.

GitHK commented 1 year ago

@elisabettai we are trying to figure out what is wrong with this issue. Currently we don't know so we have a PR to make storage provide some more meaningful logs https://github.com/ITISFoundation/osparc-simcore/pull/4867

GitHK commented 1 year ago

@elisabettai while I did try to figure out what was happening, could not reproduce the same error. Imperonating the user I could the node in question started and I manage to save it

elisabettai commented 1 year ago

Probably related to Closing new style services behaves slightly differently

GitHK commented 1 year ago

@elisabettai this should be fixed by this PR https://github.com/ITISFoundation/osparc-simcore/pull/4924

GitHK commented 1 year ago

fix in staging will be delivered to production soon