Seagate / halon

High availability solution
Apache License 2.0
1 stars 0 forks source link

[HALON-870] Node direct rebalance (stub) #1508

Closed 1468ca0b-2a64-4fb4-8e52-ea5806644b4c closed 5 years ago

1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 5 years ago

Created by: mssawant

⚠️ Requires Mero patch c/17837.

1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 5 years ago

Created by: vvv

Feeling proud of my little doc/halon-rg-schema.html automation.

diff --git a/doc/halon-rg-schema.html b/doc/halon-rg-schema.html
index 7dcce004c..0cf985253 100644                                               
--- a/doc/halon-rg-schema.html                                                  
+++ b/doc/halon-rg-schema.html                                                  
@@ -111,6 +111,7 @@ digraph ResourceGraphSchema {                               
     "M0.Pool" -> "M0.PoolRepairStatus" [label="R.Has"]
     "M0.Pool" -> "M0.PoolId" [label="R.Has"]
     "M0.Pool" -> "M0.DiskFailureVector" [label="R.Has", arrowtail=oinv, dir=both
]                                                                               
+    "M0.Node" -> "M0.NodeDiRebStatus" [label="R.Has", arrowtail=oinv, dir=both]
     "Cas.Host" -> "M0.LNid" [label="R.Has", arrowhead=onormal]                 
     "R.Cluster" -> "M0.Root" [label="R.Has"]
     "R.Cluster" -> "M0.PVerCounter" [label="R.Has"]                            

😎

1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 5 years ago

Created by: vvv

@mssawant Please post the required mero patch on Gerrit and mention it in the PR description.

1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 5 years ago

Created by: mssawant

Okay, so to imagine it, getIOS returns a [M0.Process] from which one process is assigned to ioprocs (to be changed to ioproc). similarly for svcs, and this is repeated for all the processes and services, is this the correct understanding?

1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 5 years ago

Created by: vvv

Compilation fails:

Ah, what a fool I am! 🤦‍♂️This requires modified mero with m0_spiel_node_direct_rebalance_start, of course.

1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 5 years ago

Created by: vvv

s/svcs/svc/, because the variable corresponds to a single service. Similarly s/ioprocs/proc/, s/sdevs/sdev/.

1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 5 years ago

Created by: mssawant

1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 5 years ago

Created by: vvv

getIOS is also misleading, because it returns a list of processes, while ‘IOS’ usually mean IO service(s). Each of the processes returned may host various services, not only IOS. So the interface is ad hoc.

1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 5 years ago

Created by: vvv

  1. Addition of getIOS API is not justified as it is only used once. Just put the logic into the getAttachedDevs list comprehension.
  2. s/getAttachedDevs/getAttachedSDevs/ please.
1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 5 years ago

Created by: vvv

Indeed. Sorry for the noise.

1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 5 years ago

Created by: mssawant

Okay, It would be good to actually understand the "New HA states" changes. Presently i think we are speculating it because we are not sure if the present HA states would work or if there are any bugs in Halon. May be we can have testing tasks (corresponding to the suspected areas) to analyse the various behaviours that may give us better insights on these changes.

1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 5 years ago

Created by: mssawant

Agree. Fixed.

1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 5 years ago

Created by: mssawant

Fixed. When a node is replaced and admin triggers node rebalance. Once rebalance is complete node should move to NSOnline.

1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 5 years ago

Created by: mssawant

Fixed.

1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 5 years ago

Created by: mssawant

Okay.

1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 5 years ago

Created by: mssawant

Agree.

1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 5 years ago

Created by: mssawant

Okay. We should have a coding style doc for Halon similar to Mero.

1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 5 years ago

Created by: mssawant

Removed.

1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 5 years ago

Created by: mssawant

Yes, changed here and at other places too.

1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 5 years ago

Created by: mssawant

Okay,

1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 5 years ago

Created by: mssawant

Okay.

1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 5 years ago

Created by: mssawant

Yes, fixed.

1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 5 years ago

Created by: mssawant

Okay, this looks more simple and readable.

1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 5 years ago

Created by: mssawant

Done

1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 5 years ago

Created by: mssawant

Done.

1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 5 years ago

Created by: mssawant

Done.

1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 5 years ago

Created by: mssawant

Agree. Fixed.

1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 5 years ago

Created by: mssawant

Perhaps it is needed for data constructors, otherwise it fails to compile with following error,

1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 5 years ago

Created by: mssawant

Yes, it will be the case especially for Single POD solution. Changed the functions as follows, getIOS :: M0.Node -> G.Graph -> [M0.Process] getIOS m0n rg = [ p | p <- G.connectedTo m0n M0.IsParentOf rg , any (\s -> M0.s_type s == CST_IOS) $ G.connectedTo p M0.IsParentOf rg ]

and

getAttachedDevs :: M0.Node -> G.Graph -> [M0.SDev] getAttachedDevs node rg = S.toList $ S.fromList [ sdevs | ioprocs <- Process.getIOS node rg , svcs <- G.connectedTo ioprocs M0.IsParentOf rg , M0.s_type svcs == CST_IOS , sdevs <- G.connectedTo svcs M0.IsParentOf rg :: [M0.SDev] ]

1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 5 years ago

Created by: mssawant

Okay, removed redundant description. I think it could be used more than once and it would be good to have it in a common place.

1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 5 years ago

Created by: max-seagate

 * Spiel interface is divided into two parts: configuration management and
 * command interface.
 *
 * Configuration management interface is designed in transactional manner.
 * Command interface defines individual, separate calls.

Configuration interface - 99%. As for the command interface - 1% for m0_spiel_{pool,sns,dix}_() in the form they are in master.

1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 5 years ago

Created by: vvv

  1. s/NSREBALANCE/NSRebalance
  2. Under which circumstances should a node enter this state? Leave it?
  3. This will be changed in the course of “New HA states” work.
1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 5 years ago

Created by: vvv

I think it should be

, (''Node, Unbounded, ''R.Has, AtMostOne, ''NodeDirectRebalanceStatus)

because several nodes may have the same NodeDirectRebalanceStatus.

1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 5 years ago

Created by: vvv

s/Pool/Node/

1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 5 years ago

Created by: vvv

@max-seagate What is the likelihood of Spiel existence in EOS-2 release, in %?

1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 5 years ago

Created by: vvv

@mssawant ?

1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 5 years ago

Created by: vvv

@mssawant ?

1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 5 years ago

Created by: vvv

This whole rule is

It looks like a dead weight. I'm not sure we should land this code. I'd suggest we just follow YAGNI principle.

1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 5 years ago

Created by: vvv

Use if ... then ... else ....

1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 5 years ago

Created by: vvv

[optional] Drop the comment. Node.getAttachedDevs is self-descriptive.

1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 5 years ago

Created by: vvv

Documentation is wrong. (And noisy with current self-descriptive name.)

1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 5 years ago

Created by: vvv

[optional] s/NodeDirectRebalanceRequest/NodeDirebReq/ ?

1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 5 years ago

Created by: vvv

  1. Documentation is wrong.
  2. Consider dropping the comment — NodeDirectRebalanceRequest is self-descriptive.
1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 5 years ago

Created by: vvv

  1. There is no Start in the request type.
  2. [optional] s/NodeDirectRebalanceStartResult/NodeDirebResp/?
  3. [nit] Align data constructors:
    data NodeDirebResp
    = NodeDirebOk M0.Node
    | NodeDirebErr M0.Node String
    deriving (Eq, Ord, Show, Typeable, Generic)
1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 5 years ago

Created by: vvv

s/HashSet/Set/ We usually use Data.Set for getting unique sets.

1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 5 years ago

Created by: vvv

Please replace nested case with pattern matching:

  let err = liftIO . hPutStrLn stderr
  case stat of
    Just (Evt.NodeDirectRebalanceStartSuccess n) | n == node opts ->
        return True
    Just (Evt.NodeDirectRebalanceStartFailed n e) | n == node opts ->
        err $ "Node direct rebalance failed: " ++ e
        return False
    Nothing -> do
        err "Node rebalance request timed out"
        return False
    _ -> loop
1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 5 years ago

Created by: vvv

[optional] "No reply received"

1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 5 years ago

Created by: vvv

Rename f to loop.

1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 5 years ago

Created by: vvv

[optional] The hotchpotch of $ and (...) in the same expression doesn't look good.

void . promulgateEQ eqnids . Evt.NodeDirectRebalanceRequest $ node opts
1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 5 years ago

Created by: vvv

[nit] Please align so that Opt. (those within parentheses) are one under another. (Just shift <> one position to the left.)

1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 5 years ago

Created by: vvv

Line of 143 chars is a bit too wide. Kindly split it.

parseNode = Node <$> Helpers.fidOpt
  ( Opt.short 'n' <> Opt.long "node" <> Opt.metavar "FID"
 <> Opt.help "Fid of the node to direct rebalance" )

or

parseNode = Node <$> Helpers.fidOpt
  ( Opt.short 'n'
 <> Opt.long "node"
 <> Opt.metavar "FID"
 <> Opt.help "Fid of the node to direct rebalance" )