Closed degremont closed 7 years ago
I made a mistake in the ticket title, this should be: "Stack trace on shine start when StartTarget action fails"
Could you please change it for me, as it seems I don't have sufficient privileges to do it myself.
Thanks
Original comment by: theryf
Original comment by: degremont
The start event should have been sent by Tune._launch(). I'm trying to understand why this is not the case in your stack trace.
Reproduce this stack with -vv could be interesting.
Original comment by: degremont
The -vv option does not give more details except the command run by shine. The stack trace is exactly the same, and does not provide any hints about why Tune._launch() is not called before the call to self._run_actions() inside FileSystem.start(). It is quite hard to follow the recursive callflow of _graph_ok/launch/_launch triggered by the first actions.launch() in FileSystem.start().
Original comment by: theryf
The -vv option does not give more details except the command run by shine.
It was exactly what I was interested in.
Original comment by: degremont
Here is the fix. I will be working on a more complete version of this patch.
diff --git a/lib/Shine/Lustre/Actions/Tune.py b/lib/Shine/Lustre/Actions/Tune.py
index f52e5cc..14ab1f3 100644
--- a/lib/Shine/Lustre/Actions/Tune.py
+++ b/lib/Shine/Lustre/Actions/Tune.py
@@ -97,16 +97,17 @@ class Tune(ActionGroup):
If this is a final state, raise corresponding events.
"""
- if status == ACT_OK:
- self._server.action_event(self, 'done')
- elif status == ACT_ERROR:
- # Build an error string
- errors = []
- for act in self:
- if act.status() == ACT_ERROR:
- errors.append("'%s' failed" % act._command)
- result = ErrorResult("\n".join(errors))
- self._server.action_event(self, 'failed', result)
+ if self._init:
+ if status == ACT_OK:
+ self._server.action_event(self, 'done')
+ elif status == ACT_ERROR:
+ # Build an error string
+ errors = []
+ for act in self:
+ if act.status() == ACT_ERROR:
+ errors.append("'%s' failed" % act._command)
+ result = ErrorResult("\n".join(errors))
+ self._server.action_event(self, 'failed', result)
Original comment by: degremont
Original comment by: degremont
Fixed in [a136d1]
Original comment by: degremont
On shine start, if the start action fails, shine crashes and the following stack trace is displayed:
This stack trace comes from the event handling mechanism brought into the Tune action by fix for ticket #50 (commit f1d5adaadf4faaa41f3f813d865dbc0951120f35) Method set_status() calls self._server.action_event('failed') which in turn calls Server._del_action(). The problem is there has never been a call to Server.action_event('start') before, so Server._running_actions is empty. Hence the exception.
A quick fix could be to test existence of the action before removing it, or enclose it in a try..except block. However, I suspect a dirty behaviour in Tune.set_status() (indeed, it is the only class calling action_event() inside set_status()), but I can't find out what would be the correct behaviour here.
Regards.
Reported by: theryf