This unit test only failed occasionally, making debugging this very difficult, since each run takes approximately 3 minutes. I'm not entirely sure if this fixes the problem or just pushes the frequency so low that I haven't encountered another error.
In any case, I believe this was caused by some sort of race condition inside factomd due to the loops running too fast. runtime.Gosched() just allows the scheduler to interrupt the loop but it doesn't put the goroutine to sleep or anything like that, meaning that all the loops here were just running at full speed.
That means there could be a potential problem in a real world scenario of this occurring, though it should be noted that the test doesn't entirely reproduce the real world environment. A node in the actual network dropping behaves differently than a single simulated node on the same computer having its network toggled off.
This unit test only failed occasionally, making debugging this very difficult, since each run takes approximately 3 minutes. I'm not entirely sure if this fixes the problem or just pushes the frequency so low that I haven't encountered another error.
In any case, I believe this was caused by some sort of race condition inside factomd due to the loops running too fast.
runtime.Gosched()
just allows the scheduler to interrupt the loop but it doesn't put the goroutine to sleep or anything like that, meaning that all the loops here were just running at full speed.That means there could be a potential problem in a real world scenario of this occurring, though it should be noted that the test doesn't entirely reproduce the real world environment. A node in the actual network dropping behaves differently than a single simulated node on the same computer having its network toggled off.