Seagate / halon

High availability solution
Apache License 2.0
1 stars 0 forks source link

[XXX-DONTLAND] RC/Notifier: avoid odd phase switches #1440

Closed 1468ca0b-2a64-4fb4-8e52-ea5806644b4c closed 6 years ago

1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 6 years ago

Created by: andriytk

Implemented phase switching guard for notifier which checks whether graph change notification confirmation is the one it is waiting for. This allows to avoid notifier odd phase switches, especially on a large cluster configurations when there may be a large number of notifier instances waiting for their notification confirmations.

1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 6 years ago

Created by: andriytk

The performance regression could be caused by the fact that the messages are not taken out of the cep buffer when guard is not passing. Which means that the same messages will be checked again and again many times.

extractMatchMsg :: Serializable a
                => (a -> g -> l -> Process (Maybe b))
                -> l
                -> Buffer
                -> State.StateT g Process (Maybe (Extraction b))
extractMatchMsg p l buf = go (-1)
  where
    go lastIdx =
        case bufferGetWithIndex lastIdx buf of
          Just (newIdx, a, newBuf) -> do
            res <- State.get >>= \g -> lift (Catch.mask_ $ trySome $ p a g l)
            case res of
              Left _ -> return Nothing
              Right Nothing  -> go newIdx  -- HERE!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
              Right (Just b) ->
                let ext = Extraction
                          { _extractBuf = newBuf
                          , _extractMsg = b
                          , _extractIndex = newIdx
                          }
                in return $ Just ext
          _ -> return Nothing
    trySome :: Process a -> Process (Either Catch.SomeException a)
    trySome = Catch.try
1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 6 years ago

Created by: andriytk

I struggled with this, oh yes...

1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 6 years ago

Created by: andriytk

I'm not sure it is because of this...

1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 6 years ago

Created by: andriytk

I know, I tried different ones and chose the one which looked better here for me.

1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 6 years ago

Created by: andriytk

Yes. It might be clear from the context, as for me.

1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 6 years ago

Created by: vvv

any is lazy, foldl' is strict. This explains the increased number of allocations and performance degradation.

1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 6 years ago

Created by: vvv

  1. No braces are needed around FieldRec l.

  2. [nit] Commonly used indentation style is to put => and -> under ::, unless type signature fits 80 characters.

    nGuard :: HasCallStack
          => HAEvent InternalObjectStateChangeMsg
          -> g -> FieldRec l -> Process (Maybe UUID)

    or

    nGuard :: HasCallStack
          => HAEvent InternalObjectStateChangeMsg
          -> g
          -> FieldRec l
          -> Process (Maybe UUID)
1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 6 years ago

Created by: vvv

What does "Ns" stand for? Notifications?

s/nGuard/notificationGuard/  -- or simply `guard`
s/pendingNs/pendingNotifications/
1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 6 years ago

Created by: vvv

s/iosc/changes/

1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 6 years ago

Created by: vvv

You can output error message without trace: an IO computation can be lifted to Process with liftIO.

I guess the proper way is to use log<Something> API.

1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 6 years ago

Created by: vvv

We should avoid leaving traces in production code.

1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 6 years ago

Created by: andriytk

Actually, as it appears, the cluster startup with 60 clovis-apps shows worse results with this patch for some reason: it allocates more memory and takes more time to start. Compare:

Without the patch:

Jul 04 10:57:03 sage73 time[8140]: 70,816,572,032 bytes allocated in the heap
Jul 04 10:57:03 sage73 time[8140]: 1,158,958,224 bytes copied during GC
Jul 04 10:57:03 sage73 time[8140]: 50,838,744 bytes maximum residency (26 sample(s))
Jul 04 10:57:03 sage73 time[8140]: 16,286,888 bytes maximum slop
Jul 04 10:57:03 sage73 time[8140]: 216 MB total memory in use (0 MB lost due to fragmentation)
Jul 04 10:57:03 sage73 time[8140]: Tot time (elapsed)  Avg pause  Max pause
Jul 04 10:57:03 sage73 time[8140]: Gen  0      2011 colls,     0 par    2.187s   2.483s     0.0012s    0.0088s
Jul 04 10:57:03 sage73 time[8140]: Gen  1        26 colls,     0 par    0.613s   0.736s     0.0283s    0.0511s
Jul 04 10:57:03 sage73 time[8140]: TASKS: 19 (13 bound, 6 peak workers (6 total), using -N1)
Jul 04 10:57:03 sage73 time[8140]: SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)
Jul 04 10:57:03 sage73 time[8140]: INIT    time    0.002s  (  0.004s elapsed)
Jul 04 10:57:03 sage73 time[8140]: MUT     time   90.489s  (448.165s elapsed)
Jul 04 10:57:03 sage73 time[8140]: GC      time    2.800s  (  3.219s elapsed)
Jul 04 10:57:03 sage73 time[8140]: RP      time    0.000s  (  0.000s elapsed)
Jul 04 10:57:03 sage73 time[8140]: PROF    time    0.000s  (  0.000s elapsed)
Jul 04 10:57:03 sage73 time[8140]: EXIT    time    0.000s  (  0.040s elapsed)
Jul 04 10:57:03 sage73 time[8140]: Total   time   93.298s  (451.428s elapsed)
Jul 04 10:57:03 sage73 time[8140]: Alloc rate    782,602,823 bytes per MUT second
Jul 04 10:57:03 sage73 time[8140]: Productivity  97.0% of total user, 99.3% of total elapsed
Jul 04 10:57:03 sage73 time[8140]: gc_alloc_block_sync: 0
Jul 04 10:57:03 sage73 time[8140]: whitehole_spin: 0
Jul 04 10:57:03 sage73 time[8140]: gen[0].sync: 0
Jul 04 10:57:03 sage73 time[8140]: gen[1].sync: 0
Jul 04 10:57:04 sage73 time[8140]: 93.42user 26.85system 7:32.39elapsed 26%CPU (0avgtext+0avgdata 567364maxresident)k
Jul 04 10:57:04 sage73 time[8140]: 0inputs+480560outputs (97major+201945minor)pagefaults 0swaps

$ cat /var/log/halon.decision.log | awk '/^2018/ {t=$2} /notifyMeroAsynch.epoch/ {printf("%s: %s\n", t, $0)} /markNotificationDelivered/ {m=$0} /tryCompleteStateDiff.remaining.*\[\]/ {printf("%s: %s\n", t, m)}'
09:49:40.880283:     §DEBUG notifyMeroAsynch.epoch => 0 notifyMeroAsynch.nodes => [] ()
09:49:40.880283: 
09:49:44.931221:     §DEBUG notifyMeroAsynch.epoch => 1
...
09:51:59.64498:     §Invoked: markNotificationDelivered epoch => 125
09:52:00.301219:     §Invoked: markNotificationDelivered epoch => 126
09:52:00.451257:     §Invoked: markNotificationDelivered epoch => 127

With the patch:

Jul 04 10:45:43 sage73 time[6343]: 130,703,798,736 bytes allocated in the heap
Jul 04 10:45:43 sage73 time[6343]: 1,044,565,280 bytes copied during GC
Jul 04 10:45:43 sage73 time[6343]: 51,887,736 bytes maximum residency (26 sample(s))
Jul 04 10:45:43 sage73 time[6343]: 16,049,576 bytes maximum slop
Jul 04 10:45:43 sage73 time[6343]: 198 MB total memory in use (0 MB lost due to fragmentation)
Jul 04 10:45:43 sage73 time[6343]: Tot time (elapsed)  Avg pause  Max pause
Jul 04 10:45:43 sage73 time[6343]: Gen  0      3845 colls,     0 par    2.446s   2.583s     0.0007s    0.0068s
Jul 04 10:45:43 sage73 time[6343]: Gen  1        26 colls,     0 par    0.582s   0.620s     0.0239s    0.0392s
Jul 04 10:45:43 sage73 time[6343]: TASKS: 19 (13 bound, 6 peak workers (6 total), using -N1)
Jul 04 10:45:43 sage73 time[6343]: SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)
Jul 04 10:45:43 sage73 time[6343]: INIT    time    0.002s  (  0.004s elapsed)
Jul 04 10:45:43 sage73 time[6343]: MUT     time  115.395s  (207.389s elapsed)
Jul 04 10:45:43 sage73 time[6343]: GC      time    3.028s  (  3.204s elapsed)
Jul 04 10:45:43 sage73 time[6343]: RP      time    0.000s  (  0.000s elapsed)
Jul 04 10:45:43 sage73 time[6343]: PROF    time    0.000s  (  0.000s elapsed)
Jul 04 10:45:43 sage73 time[6343]: EXIT    time    0.020s  (  0.023s elapsed)
Jul 04 10:45:43 sage73 time[6343]: Total   time  118.455s  (210.619s elapsed)
Jul 04 10:45:43 sage73 time[6343]: Alloc rate    1,132,666,171 bytes per MUT second
Jul 04 10:45:43 sage73 time[6343]: Productivity  97.4% of total user, 98.5% of total elapsed
Jul 04 10:45:43 sage73 time[6343]: gc_alloc_block_sync: 0
Jul 04 10:45:43 sage73 time[6343]: whitehole_spin: 0
Jul 04 10:45:43 sage73 time[6343]: gen[0].sync: 0
Jul 04 10:45:43 sage73 time[6343]: gen[1].sync: 0
Jul 04 10:45:44 sage73 time[6343]: 119.43user 10.05system 3:31.62elapsed 61%CPU (0avgtext+0avgdata 542504maxresident)k
Jul 04 10:45:44 sage73 time[6343]: 32inputs+453664outputs (92major+193443minor)pagefaults 0swaps

$ cat /var/log/halon.decision.log | awk '/^2018/ {t=$2} /notifyMeroAsynch.epoch/ {printf("%s: %s\n", t, $0)} /markNotificationDelivered/ {m=$0} /tryCompleteStateDiff.remaining.*\[\]/ {printf("%s: %s\n", t, m)}'
09:42:21.818555:     §DEBUG notifyMeroAsynch.epoch => 0 notifyMeroAsynch.nodes => [] ()
09:42:21.818555: 
09:42:25.861871:     §DEBUG notifyMeroAsynch.epoch => 1
...
09:45:01.378134:     §Invoked: markNotificationDelivered epoch => 125
09:45:01.599069:     §Invoked: markNotificationDelivered epoch => 126
09:45:01.737431:     §Invoked: markNotificationDelivered epoch => 127

So I would not recommend to land the patch for now.