Seagate / halon

High availability solution
Apache License 2.0
1 stars 0 forks source link

WIP: HALON-876: fix 'Data.Binary.Get.runGet at position NNN: not enough bytes' issue #1563

Closed andriytk closed 5 years ago

andriytk commented 5 years ago

Sometimes during cluster start/stop or when some of the halond processes crashes (due to an SSU crash, especially the RC one on a multi-TS-nodes setup) or when this process restarts later - some of the halond processes may fail with the following error in the system log and get stuck after this:

Jan 23 20:30:29 ssu2 halond[4721]: halond: Data.Binary.Get.runGet at position 37425: not enough bytes
Jan 23 20:30:29 ssu2 halond[4721]: CallStack (from HasCallStack):
Jan 23 20:30:29 ssu2 halond[4721]: error, called at src/Data/Binary/Get.hs:342:5 in binary-0.8.3.02pMP266HkYr163LicEzfmx:Data.Binary.Get
andriytk commented 5 years ago

The fix for the network-trasport-tcp pkg was landed on GitHub and we take the fixed version since commit a88b4e56.

andriytk commented 5 years ago

closed

andriytk commented 5 years ago

Actually, the diff is not so huge, see https://github.com/haskell-distributed/network-transport-tcp/pull/86. Let's wait for a few weeks until it will be landed there. If not, we will do it locally here.

vvv commented 5 years ago

Kindly describe — in the merge commit message and in the merge request description — how this chain of commits solves the problem. It's hard to tell looking at the diff alone — the diff is huge.

vvv commented 5 years ago

Would you prefer all 18 commits to be merged as they are, or would you rather cleanup them a bit (e.g., squash "revert" commits)?

In the former case I'll create an additional "merge" commit.

andriytk commented 5 years ago

added 1 commit

Compare with previous version

andriytk commented 5 years ago

added 8 commits

Compare with previous version

andriytk commented 5 years ago

added 4 commits

Compare with previous version

andriytk commented 5 years ago

added 4 commits

Compare with previous version

andriytk commented 5 years ago

changed the description

andriytk commented 5 years ago

changed the description

andriytk commented 5 years ago

changed title from WIP: {-check network-transport change for EndPointAddress Binary instance implementation-} to WIP: {+HALON-876: fix 'Data.Binary.Get.runGet at position NNN: not enough bytes' issue+}

andriytk commented 5 years ago

added 4 commits

Compare with previous version

andriytk commented 5 years ago

added 3 commits

Compare with previous version

andriytk commented 5 years ago

added 3 commits

Compare with previous version

andriytk commented 5 years ago

added 5 commits

Compare with previous version

1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 5 years ago

[TEST] @dmitriy.chumak this is a notification test, please ignore it