Seagate / halon

High availability solution
Apache License 2.0
1 stars 0 forks source link

[HALON-870] Node direct rebalance (stub) #1508

Closed 1468ca0b-2a64-4fb4-8e52-ea5806644b4c closed 5 years ago

1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 5 years ago

Created by: mssawant

⚠️ Requires Mero patch c/17837.

1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 5 years ago

Created by: vvv

[nit,optional] Consider restyling:

data Options = Options
  { node :: Node
  , rebTimeout :: Int
  } deriving (Show, Eq)

(Halon project does not have any coding style document. I guess we should come up with one... There is Haskell Style Guide to borrow from. TODO™️.)

1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 5 years ago

Created by: vvv

[nit] I don't think (..) part is needed.

1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 5 years ago

Created by: vvv

Not quite. I propose to add another getSDevs in different module. Which one is used will be governed by import statements.

1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 5 years ago

Created by: mssawant

okay, you propose to modify getSDevs, this may need relevant changes to its callers, mainly in repair related rule functions. They mainly deal with the pool and invoked as a generic function. I thought having a separate function for service SDevs would be better to avoid more changes for now.

1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 5 years ago

Created by: mssawant

  1. Agree, event posting should not change much though, yes, for rule, ruleRepairStart is used to model ruleDirectRebalanceStart.
  2. I see that it checks for the status of start operation couple of times with some timeout, but yes i will remove it to make it simple and take it from there.
1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 5 years ago

Created by: mssawant

Okay.

1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 5 years ago

Created by: mssawant

Done.

1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 5 years ago

Created by: mssawant

Done.

1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 5 years ago

Created by: mssawant

Done.

1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 5 years ago

Created by: vvv

getSDevs :: M0.Node -> G.Graph -> [M0.SDev]
getSDevs node rg = S.toList . S.fromList $
  [ sdev
  | proc :: M0.Process <- getIOS node rg
  , svc :: M0.Service <- G.connectedTo proc M0.IsParentOf rg
  , sdev :: M0.SDev <- G.connectedTo svc M0.IsParentOf rg
  ]
1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 5 years ago

Created by: vvv

What if several processes are linked to the node? I think you should return a list.

1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 5 years ago

Created by: vvv

This line is noisy, drop it.

  1. This is not an action, this is a pure function.
  2. It is only used once.
  3. Even if it will be used more widely, we rarely decorate library functions with such descriptions.
1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 5 years ago

Created by: vvv

s/R/M0/

1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 5 years ago

Created by: vvv

List comprehension is not needed here.

getServiceSDevs :: M0.Service -> G.Graph -> [M0.SDev]
getServiceSDevs svc = S.toList . S.fromList . G.connectedTo svc M0.IsParentOf

Actually, the function is not needed either (and if it was, it wouldn't belong this module, which is about Nodes). See getSDevs in my comment below.

1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 5 years ago

Created by: vvv

Remove dead code or at least mark it with XXX.

1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 5 years ago

Created by: vvv

I presume you should parse Node somehow.

1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 5 years ago

Created by: vvv

We will do just fine without node function, the API will be simpler.

data Options = Options Node
  deriving (Show, Eq)
1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 5 years ago

Created by: vvv

1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 5 years ago

Created by: vvv

This function is modelled after Resume case of runRepReb. I'm not sure this is a good choice.

  1. hctl mero pool repreb family of commands controls some already running SNS operation (identified by UUID). Your command will be creating (initiating) totally new SNS operation.
  2. I don't quite understand why runRepReb calls f two times. Do you? (I also condemn the choice of name, but that's a different issue.)

I suspect that number 2 comes from the number of requests sent from hctl to the RC (PoolRepairRequest and PoolRebalanceRequest), but I don't understand why runRepReb has to send both requests in one go. Perhaps the answer lies in the implementation of ruleRepairStart and ruleRebalanceStart functions, which I don't know well (and which are rather hairy 💂‍♀️ ).

tl;dr: IMO, Quiesce case of runRepReb would be a more suitable model.

1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 5 years ago

Created by: vvv

[nit,optional] Start line 23 with = so that data constructors are aligned.

data Options
  = Dreb Dreb.Options
  | Remove Remove.Options
  | Start Start.Options
[...]
1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 5 years ago

Created by: vvv

[nit] Sort imports alphabetically.

1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 5 years ago

Created by: andriytk

Add some description here what this module is about.