Open helins opened 1 year ago
Hmmm 2 regions may be a slightly weird edge case.... both regions presumably have fast internal connections so you may get two "blocks" of peers disagreeing with each other before they have a chance to get aligned across regions. will take a look at this case.
We should be enabling fork recovery soon anyway (in which case this becomes a less serious issue)
But why is 2 regions is radically different from 3?
In the few 3-region runs I did, only 1 run had 1 peer with only 28 such exceptions.
Curiously, it also showed up in a single region run with 36 peers and only 1 user, albeit to a much lesser extent and the run did complete:
R1-36-LATENCY/log/peer/27.cvx:4
R1-36-LATENCY/log/peer/32.cvx:2
R1-36-LATENCY/log/peer/30.cvx:6
R1-36-LATENCY/log/peer/25.cvx:2
R1-36-LATENCY/log/peer/3.cvx:2
R1-36-LATENCY/log/peer/28.cvx:2
R1-36-LATENCY/log/peer/16.cvx:2
This seems to happen especially when 2 regions are involved, for some unknown reasons:
Example of a distribution for a 2-region, 24 peers setup, 10 min run (counts):
It is odd that it happens so systematically. A real fork should be a rare event, especially when running in reliable data centers.