Closed elisabettai closed 2 years ago
@elisabettai @sanderegg - just for additional context, I tried instantiating at least 20 new iSeg nodes on 15.9.22 between 10:00 CET and 16:30 CET. So it is not isolated to the two cases referred to above.
Also got a support request for a sim4life-dy service on 16-09 around 16:00 (Zurich time). The user reported that she can create the sim4life-dy node, the application was loading, but the she got a white screen (see that on Fogbugz).
This is what I found in graylog (study id: a6d3d03a-14bd-11ed-a399-02420a0b0527
, node id: a659701d-75c8-4219-95de-3edd9d0b1d88
).
ERROR:simcore_service_director_v2.modules.dynamic_sidecar.api_client._base:Unexpected error with 'e.request=<Request('GET', 'http://dy-sidecar_a659701d-75c8-4219-95de-3edd9d0b1d88:8000/health')>': e=ConnectError('[Errno -2] Name or service not known'), (attempt [1/4])
ERROR:simcore_service_director_v2.modules.dynamic_sidecar.api_client._base:Unexpected error with 'e.request=<Request('GET', 'http://dy-sidecar_a659701d-75c8-4219-95de-3edd9d0b1d88:8000/v1/containers?only_status=true')>': e=ConnectError('[Errno -2] Name or service not known'), (attempt [1/4])
ERROR:simcore_service_director_v2.modules.dynamic_sidecar.api_client._base:Unexpected error with 'e.request=<Request('GET', 'http://dy-sidecar_a659701d-75c8-4219-95de-3edd9d0b1d88:8000/v1/containers?only_status=true')>': e=ConnectError('All connection attempts failed'), (attempt [2/4])
Then it retries, and gives a 404.
[0;32mINFO[0m: [2022-09-16 14:47:57,980/MainProcess] [uvicorn.access:send(438)] - 172.13.5.248:60372 - "GET /v1/containers/name?filters=%7B%22network%22:%20%22dy-sidecar_a659701d-75c8-4219-95de-3edd9d0b1d88%22%7D HTTP/1.1" 404
Then it tries to upload the workspace?
[0;32mINFO[0m: [2022-09-16 14:49:04,435/MainProcess] [simcore_sdk.node_data.data_manager:_push_file(41)] - uploading workspace.zip to S3 to a6d3d03a-14bd-11ed-a399-02420a0b0527/a659701d-75c8-4219-95de-3edd9d0b1d88/workspace.zip
It also looks to me that the container (at least the dy-sidecar and proxy are running), but "manager3" can't find the node.
According to @GitHK, there should be a fix in master and staging, that needed to be tested.
iSeg
was failing due to lack of resources on production.
sim4life-dy
should also now start properly on production, the issue was hot fixed
I've just tested on osparc.io and works (added new iSeg and sim4life-dy nodes).
Hot fixed with: https://github.com/ITISFoundation/osparc-simcore/releases/tag/v1.34.6 (checked with PC)
Reporting an issue for @newton1985, happened for 2 iseg nodes on osparc.io on 15-09, between 10:06 UTC and 10:27 UTC. These are newly-created iSeg nodes.
For the first node, this is what I got from graylog (searching "909e0a1d-bd83-4d26-8633-a51e671c18bc", between the time range above):
director-v2:
Then some retries and fails with (I am omitting the full message, can be retrieved with OEC:140448794075776):
Then:
The other iSeg node (89e852f7-6501-4a7d-ae26-89f45ea5e52c) fails in a similar manner.
fyi @sanderegg. Maybe is should have been already fixed by PR #3218