Closed andriytk closed 5 years ago
changed the description
Ok, fixed.
There is no such function anymore.
It's too long already.
added 3 commits
changed this line in version 5 of the diff
[optional]
let exclude p =
return . Right $ (rlens fldWaitingProcs %~ fieldMap (filter (/= p))) l
I find the name (right
) confusing.
s/Unstarted/NotOnline/
please. (Yes, I'm aware that the function was written by someone else.)
0) The name of this function is misleading. One has to see its documentation or implementation in order to understand what it actually does.
1) We exclude PSOnline
, not PSStarting
.
-- | Process state. This is a generalisation of what might be reported to Mero.
data ProcessState =
PSUnknown -- ^ Process state is not known.
| PSOffline -- ^ Process is stopped.
| PSStarting -- ^ Process is starting but we have not confirmed started.
| PSOnline -- ^ Process is online.
| PSQuiescing -- ^ Process is online, but should reject any further requests.
| PSStopping -- ^ Process is currently stopping.
| PSFailed String -- ^ Process has failed, with reason given
| PSInhibited ProcessState -- ^ Process state is masked by a higher level
-- failure.
2) There is no “unstarted” word in English (and ‘un-’ makes an antonym, which in this case would be “stopped”).
s/Srv/Server/
please.
@andriy.tkachuk marked as a Work In Progress
WIP:
prefix in MR's title signifies that this MR should not be landed.
marked as a Work In Progress
added 2 commits
added 1 commit
added 1 commit
assigned to @vvv
changed the description
changed title from HALON-911: fix processes restart{-ing-} after node failure to HALON-911: fix processes restart after node failure
resolved all discussions
Not worth it.
merged
s/m0tifs/m0t1fs/
Would you mind to
getNotOnlineSrvProcesses
(see http://gitlab.mero.colo.seagate.com/mero/halon/merge_requests/1585#note_8234 for the justification)or
processStartProcessesOnNode.nodeFailedWith
?added 7 commits
master
resolved all discussions
added 1 commit
changed this line in version 6 of the diff
unmarked as a Work In Progress
It was possible that the node's processes would get stuck in a failed state with 'node failure' status on cluster startup sometimes. It happened because the node::process::start rule instance was not finishing until all the cluster processes are started up (including the client ones). As result, the attempt to restart the processes on the failed and restored node was failing because the node::process::start rule instance was 'already running'.
Now we finish the rule instance as soon as there are no more processes left to start on the node or some previously started processes on the node got failed already (due to the node failure, for example).