ITISFoundation / osparc-issues

🐼 issue-only repo for the osparc project
3 stars 5 forks source link

Stop button to get services out of the failed state. (Can probably be linked to #1045) #1167

Open Konohana0608 opened 10 months ago

Konohana0608 commented 10 months ago

Describe the user role As a SPARC investigator, I would like to be able to force stop services if the are in the Failed state.

Describe the goal Really often, when a study gets closed due to the user logging out, opened services within this open study will be stuck in the failed state when said study is reopened. Often I encounter that the "Run" button won't do anything and I don't have any meaningful way apart from contacting support to get that failed Service back up and running. (Linked to #1045)

Describe the benefit To make users less dependent on support and relieve our dev team from these rather unproductive little tasks.

Additional context image

elisabettai commented 10 months ago

Hi @Konohana0608, do you know what was exactly the problem with that "Failed" node?

I also had it recently and it failed because while it was closing, there was some problem saving the data. So in that case, it is probably a good idea to contact support (if you care about the data), or not?

Konohana0608 commented 10 months ago

Hey @elisabettai ~~ Ehm as usual, I have absolutely no idea^^ The logs are not very helpful ;)

image

I remember the only thing I did was to fetch the newest version of the TIP Manual repository but didn't actually do any manual changes. It makes sense to me, that if there are auto saving issues that then probably support should be included. But aren't there anyways autosaves that happen in certain intervals? And from my experience the ability to retrieve data from failed services is anyways highly unreliable. Isn't it normally that that is only possible if the service has been closed successfully at any point before?

Maybe the better way would be to add some kind of fail state analysis to the stop button that makes a quick decision based on the reason for the failure if the user should be able to force stop the node or not? Sorry I probably know far too few about the backend to give meaningful ideas here^^