cooling-singapore / saas-middleware

Simulation-as-a-Service (SaaS) Middleware
MIT License
0 stars 0 forks source link

Fix issue with restarting execution nodes #450

Closed HeikoAydt closed 5 months ago

HeikoAydt commented 6 months ago

Description Restarting execution nodes may result in "No state found" errors:

Traceback (most recent call last):
  File "/mnt/storage/dev/venv-saas/lib/python3.10/site-packages/saas/cli/saas_cli.py", line 112, in main
    cli.execute(sys.argv[1:])
  File "/mnt/storage/dev/venv-saas/lib/python3.10/site-packages/saas/cli/helpers.py", line 424, in execute
    super().execute(args)
  File "/mnt/storage/dev/venv-saas/lib/python3.10/site-packages/saas/cli/helpers.py", line 388, in execute
    command.execute(args)
  File "/mnt/storage/dev/venv-saas/lib/python3.10/site-packages/saas/cli/helpers.py", line 388, in execute
    command.execute(args)
  File "/mnt/storage/dev/venv-saas/lib/python3.10/site-packages/saas/cli/helpers.py", line 388, in execute
    command.execute(args)
  File "/mnt/storage/dev/venv-saas/lib/python3.10/site-packages/saas/cli/cmd_rti.py", line 251, in execute
    status: ProcessorStatus = rti.get_status(proc_id)
  File "/mnt/storage/dev/venv-saas/lib/python3.10/site-packages/saas/rti/proxy.py", line 75, in get_status
    result = self.get(f"proc/{proc_id}/status")
  File "/mnt/storage/dev/venv-saas/lib/python3.10/site-packages/saas/rest/proxy.py", line 220, in get
    return extract_response(response)
  File "/mnt/storage/dev/venv-saas/lib/python3.10/site-packages/saas/rest/proxy.py", line 65, in extract_response
    raise exception
saas.rest.exceptions.UnsuccessfulRequestError: No state found for processor 131cfd9a3159e73d69bd791a1fc93375db37e58d4004a4371a75d2e2dba502b6

This may be due to the node to deploy previously undeployed (and now unavailable) procs upon restart of the node. In the example above, only these procs were deployed prior restart:

8a231c0a68f715c29b60311d89be2f2486418918a0943a15b032629a7f5803fe:ucm-mva-uwc [OPERATIONAL] pending=(none) active=(none)
fc0697d6618f669a331535b02444bcbf1529aec101555af605b6ba6a12289f68:ucm-mva-uvp [OPERATIONAL] pending=(none) active=(none)
6a8c48567f570c5053cf158bc51960e9db6e4dca417b71f35840cc48a84b06d0:bem-cea-sim [OPERATIONAL] pending=(none) active=(none)
3606afb7675d3b1c71f3ba65bf6778d7b98501f5bc197e1d1a26054a11ebcc84:bem-cea-eei [OPERATIONAL] pending=(none) active=(none)
94d68935a043337a94cbdad29de0b652d9a7f54ecddfda6890fbfcb70f811525:bem-cea-gen [OPERATIONAL] pending=(none) active=(none)
a93790bf3a74761ef0d9136b0c628d305fcb80da29e3e443afdb92fb3fe416b4:ucm-palm-prep [OPERATIONAL] pending=(none) active=(none)

The id of the proc that caused problems was 131cfd9a3159e73d69bd791a1fc93375db37e58d4004a4371a75d2e2dba502b6 which is not part of that list but was previously deployed.

Outcomes