Closed 1468ca0b-2a64-4fb4-8e52-ea5806644b4c closed 5 years ago
closed
But it only change "PoolRepairStatus" not drive status.
marked the task abort of SNS operation should reset states of involved storage devices [context] as incomplete
marked the task abort of SNS operation should reset states of involved storage devices [context] as completed
changed the description
marked as a Work In Progress
- Where
PoolRepairStatus
used to be set before this change?
I've found out that in the original code (master) PoolRepairStatus
is being set in mkRepairStartOperation.mkRepairOperationStarted
.
Sorry for the noise.
- Where
PoolRepairStatus
used to be set before this change?
We did not changed code related to this. But we can check it on master by manually triggering abort using hctl command.
- Shouldn't similar change be made in
ruleRebalanceStart
?
Yes, you are right.
Yes, this fix need to cover rebalance too.
@rajanikant.chirmade Please answer the remaining comments and propose a solution for the “sdevs remain in SDSInhibited SDSRepairing
state” problem.
added 1 commit
changed this line in version 7 of the diff
changed this line in version 7 of the diff
@rajanikant.chirmade ?
changed this line in version 6 of the diff
changed this line in version 6 of the diff
changed this line in version 6 of the diff
[nit] Use blank lines to separate sections of code.
PoolRepairStatus
used to be set before this change?ruleRebalanceStart
?added 1 commit
changed this line in version 5 of the diff
changed this line in version 5 of the diff
changed this line in version 5 of the diff
Document.
, getConfObjState sdev rg `elem` [ M0_NC_REPAIR
, M0_NC_REBALANCE
]
?
What about M0_NC_REBALANCE
? I think we want to abort it as well. In fact, we want to abort any ongoing SNS operation.
Since abortRepairFromProc
is called for any timed-out process (see ruleProcessKeepaliveReply
), it is quite possible that proc
does not host CST_IOS
service. In this case pools
will be null and Log.DEBUG "Repair not running"
message will be printed.
Are we okay with that?
changed this line in version 4 of the diff
changed this line in version 4 of the diff
changed this line in version 4 of the diff
Prelude.
is not needed.
Suggestion: add a type signature.
getProcs :: [(Fid, M0.TimeSpec)] -> Graph -> [(M0.Process, M0.TimeSpec)]
getProcs fids rg = [ (p, t)
| (fid, t) <- fids
, Just p <- [M0.lookupConfObjByFid fid rg]
]
[nit] Remove braces.
added 3 commits
master
assigned to @rajanikant.chirmade
@rajanikant.chirmade You've mentioned some problem with this patch. Can you describe what exactly is not working?
AFAIU, the $M0_CLUSTER
file you test against looks like this:
clovis-apps: [ client1.local ]
confds: [ cmu.local ]
ssus:
- host: ssu1.local
disks: /dev/sd[b-h]
- host: ssu2.local
disks: /dev/sd[b-h]
How do you test the patch? What happens? What is the expected behaviour?
uuid
is unnecessary, we can do fine without auxiliary identifier.
fromJust
is dangerous and should generally be avoided. It will throw runtime error if prs
is Nothing
.
Use explicit import list, i.e.
import HA.RecoveryCoordinator.Castor.Drive.Rules.Repair (abortRepairFromProc)
[nit] Parentheses are not needed here.
added 6 commits
master
assigned to @valery.vorotyntsev
Created by: rajanikantchirmade
Yes, this need to be fix. Cherry picked your patch. Thanks.
Created by: vvv
What if several sdevs of the proc
are being repaired (M0_NC_REPAIR
)? What if two of those sdevs belong different pools?
Created by: rajanikantchirmade
Add tentative fix.
TODO
PoolRepairStatus
inruleRebalanceStart
[context]