ITISFoundation / osparc-simcore

🐼 osparc-simcore simulation framework
https://osparc.io
MIT License
43 stars 27 forks source link

Zero downtime: Connected users should be able to continue working with osparc when osparc micro-services are restarted #2212

Closed sanderegg closed 1 month ago

sanderegg commented 3 years ago

Use-case:

  1. someone is connected to osparc, working with studies
  2. the osparc-platform is re-deployed
  3. the user of the osparc platform should continue to work seamlessly, maybe with a small acceptable glitch but it should definitely not complete with receiving 500 HTTP codes

references:

graylog entries related to failed e2e

the e2e of isolve-mpi failed with the webserver returning a 500 for listing projects. one can see in the logs that the webserver was restarting at that moment.

Docker reference:

pcrespov commented 3 years ago

Related to #2140

Possible cause:

The swarm is already configured to have zero downtime per service (i.e. a given service gets turned off ONLY when the new one is started). The problem might be that even if services are ready, the state between services is not ready. For example, the new webserver is updated correctly but traffik proxy has still not detected it. That would cause a wrong gateway failure on a front-end request

Ideas to solve this problem

sanderegg commented 3 years ago

Testing like so:

GitHK commented 1 month ago

duplicate of #5614