HALON-795: fix start/stop on a big cluster configurations

1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 6 years ago

Created by: andriytk

The problem was with the number of notifications we send in N-process cluster configurations when we sent one notification per each single process change, i.e. O(N^2/2) in total for large amount of changes made at the same time relatively. Halon could not process such amount of events in a reasonable time. (The timeout is 2 minutes currently.)

Now we accumulate all the processes changes into one single notification packet which is then sent to all participants. We do such accumulation until there are no more changes for some time (2 seconds currently) and then send what we've got so far. In theory, such approach might decrease the amount of notifications down to O(N).

1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 5 years ago

Created by: andriytk

Ah, sorry - `a' is used for the "ha service Fid" parameter already.

1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 5 years ago

Created by: vvv

Well, if you think so.

1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 5 years ago

Created by: vvv

OK.

1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 5 years ago

Created by: andriytk

I you prefer it, I don't mind.

1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 5 years ago

Created by: andriytk

The description of this parameter is just a few lines above in the comment, so I don't think it is cryptic here at all. And it seems to be perfectly fit into the style of the other params names here.

1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 5 years ago

Created by: andriytk

I think that in our case it would be useful to have the possibility to specify a fraction of the second also.

1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 5 years ago

Created by: andriytk

As I wrote in the commit description - m0t1fs mount was not ready yet after `h0 start' finish and the test was started. As result, of course it did fail. But it happened on my old MacBook (early 2011) host which is quite slow, and on VM with 2 CPUs configured. Especially just after reboot of the VM when systemd takes a lot of CPU (doing some maintenance stuff probably, CentOS-7.5).

1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 5 years ago

Created by: vvv

[optional] s/d/a/?

1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 5 years ago

Created by: vvv

[optional] s/adelay/aggrDelay/ There is no actual need to be cryptic.

1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 5 years ago

Created by: vvv

All other timeout values in HalonVars are in seconds. I don't see a need to deviate.

1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 5 years ago

Created by: vvv

Sorry, I didn't mean it to be a yes/no question. :) Would you mind providing some details about the m0t1fs mount issue which you encountered? I never experienced one — not with h0 run-st at least — and am curious to know. TIA.

1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 5 years ago

Created by: andriytk

Yes.

1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 5 years ago

Created by: vvv

Did you have an issue with m0t1fs mount?

1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 5 years ago

Created by: andriytk

(Just for the record.)

We need to do at least two things before landing: 1) add a parameter for the accumulation time (not for the public, but rather for developers convenience when finding an optimal value); 2) update the commit description.

1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 5 years ago

Created by: andriytk

Let's make it 2 seconds by default then.

1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 5 years ago

Created by: vvv

t_stop-reverts-sdev-states ST fails on my devvm even with the latest h795 patch (be13a1b3d).

+ [22] hctl mero stop
Stopping cluster.
Cluster stop initiated.
Process{0x7200000000000001:0x26}: PSStopping -> PSOffline                       
Service{0x7300000000000001:0x27}: SSStopping -> SSOffline                       
Progress: 3.85% -> 11.54%
Process{0x7200000000000001:0x1e}: PSStopping -> PSOffline                       
Service{0x7300000000000001:0x25}: SSStopping -> SSOffline                       
Service{0x7300000000000001:0x1f}: SSStopping -> SSOffline                       
Service{0x7300000000000001:0x22}: SSStopping -> SSOffline                       
Service{0x7300000000000001:0x20}: SSStopping -> SSOffline                       
Service{0x7300000000000001:0x24}: SSStopping -> SSOffline                       
Service{0x7300000000000001:0x21}: SSStopping -> SSOffline                       
Service{0x7300000000000001:0x23}: SSStopping -> SSOffline                       
Progress: 11.54% -> 42.31%
Process{0x7200000000000001:0x1b}: PSStopping -> PSOffline                       
Service{0x7300000000000001:0x1c}: SSStopping -> SSOffline                       
Service{0x7300000000000001:0x1d}: SSStopping -> SSOffline                       
Progress: 42.31% -> 53.85%
Cluster stop failed: StopProcessesOnNodeFailed (Node nid://192.168.195.138:9070:0) "halon:m0d service stop timed out"
9.20user 74.24system 7:06.62elapsed 19%CPU (0avgtext+0avgdata 60480maxresident)k
58704inputs+25616outputs (82major+1640718minor)pagefaults 0swaps

Observation: when the aggregation window is reduced (from 5 seconds to 2), the system test passes successfully.

1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 5 years ago

Created by: andriytk

The latest patch seems to work fine on 7 SSUs + 1 CMU + 1 Client node setup:

$ cat /var/log/halon.decision.log | awk '/^2019/ {t=$2} /notifyMeroAsynch.epoch/ {printf("%s: %s\n", t, $0)} /markNotificationDelivered/ {m=$0} /fid =/ {p=$6} /tryCompleteStateDiff.remaining.*\[\]/ {printf("%s: %s last=%s\n", t, m, p)}'
...
16:07:39.668848:     §DEBUG notifyMeroAsynch.epoch => 21
16:07:39.693597:     §DEBUG notifyMeroAsynch.epoch => 22
16:07:39.763777:     §DEBUG notifyMeroAsynch.epoch => 23
16:07:39.849773:     §DEBUG notifyMeroAsynch.epoch => 24
16:07:39.882724:     §DEBUG notifyMeroAsynch.epoch => 25
16:07:39.93362:     §DEBUG notifyMeroAsynch.epoch => 26
16:07:40.008601:     §DEBUG notifyMeroAsynch.epoch => 27
16:07:46.313389:     §Invoked: markNotificationDelivered epoch => 21 last=0x7200000000000001:0x15
16:07:46.313389:     §Invoked: markNotificationDelivered epoch => 22 last=0x7200000000000001:0x15
16:07:46.313389:     §Invoked: markNotificationDelivered epoch => 23 last=0x7200000000000001:0x15
16:07:46.313389:     §Invoked: markNotificationDelivered epoch => 24 last=0x7200000000000001:0x15
16:07:46.313389:     §Invoked: markNotificationDelivered epoch => 25 last=0x7200000000000001:0x15
16:07:46.313389:     §Invoked: markNotificationDelivered epoch => 26 last=0x7200000000000001:0x15
16:07:46.313389:     §Invoked: markNotificationDelivered epoch => 27 last=0x7200000000000001:0x15

1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 5 years ago

Created by: andriytk

I like it. Thanks!

1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 5 years ago

Created by: andriytk

In the last patch I've fixed the handling of non-synchronized notification Acks from different SATs.

1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 5 years ago

Created by: andriytk

It's where we accumulate the notification messages. MVar can be used not only to share information between the threads. In this case we use it as a Modified Variable were we store our messages during accumulation.

1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 5 years ago

Created by: vvv

I've updated the patch. Will continue the review tomorrow.

1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 5 years ago

Created by: vvv

MVar is not needed. There is no state shared between statusProcess threads.

1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 5 years ago

Created by: andriytk

With this 👆 latest patch h0 run-st finished SUCCESSfully on my devvm.

1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 5 years ago

Created by: andriytk

Was able to start the cluster with 300 clovis-apps. Tried with more (500), but confd crashed (too big configuration for it probably).

1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 5 years ago

Created by: andriytk

Oh, sorry Max - I misunderstood your question.

The patch is to be tested yet...

1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 6 years ago

Created by: max-seagate

It would also be interesting to see how the notifications are grouped during the bootstrap for large number of processes.

1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 6 years ago

Created by: max-seagate

Sorry, I can't find the maximum numbers of processes Halon can handle with this patch. Could you please clarify?

1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 6 years ago

Created by: andriytk

Max, you can get this info from the ticket description and comments. It's about 200 client processes on singlenode setup or 4*8 client processes on 6 SSUs setup.

Seagate / halon

HALON-795: fix start/stop on a big cluster configurations #1503