Closed weissi closed 6 years ago
yes, reproes by just looping it:
run_in_docker bash -c 'while swift test --filter NIOTests.EventLoopTest/testShuttingDownFailsRegistration; do true; done'
did it after 137 runs.
This is the same issue as #334. I'll do a quick sweep over the tests to confirm we aren't making this mistake anywhere else.
Sorry, I don't know why I said that: looking at the code again doesn't seem to back up the idea that this is the same issue.
I'm suspicious of the fact that the first channel registration+connect got wrapped in submit
. I wonder if this ends up creating a race where Channel.close
doesn't get called, because the connection attempt races with the loop.closeGently
in the other line.
As a general rule, you're waiting on submit
blocks, but not waiting on the futures that the calls inside those blocks create. That is probably the wrong thing to do.
Hrm, that theory doesn't seem to be right though, as adding a wait for that connection does not make the problem go away.
Ok, got it.
The issue here is a simple one of timing. The premise of the test is that the WedgeOpenHandler
will keep the channel open by catching calls to close
, which will prevent the selector closing the channel during graceful shutdown. This is true!
However, the WedgeOpenHandler
only prevents that for the client channel. However, there is a server side to that channel too, and that channel is in the same event loop as the client side. That channel is not protected against being closed by the selector, so it will be. That opens a timing window where the selector may cause the client channel to read EOF from the socket. EOF from the socket does not cause a close
to travel through the pipeline, and so this will actually close the client channel directly (via calling close0
with .eof
as the error).
There are a few ways we can deal with this, but the easiest may be to set the client channel to allow remote half closure. This should prevent the reading of EOF causing a channel close. However, it's a bit perplexing and would definitely require a comment (and may not function on all OSes, it depends a bit on how intelligent the kernel is: for example, this definitely would not work if we were using AF_UNIX sockets). The other option is to wedge both the server and client channels open, and then ungate both of them at the end of the test. That one is more complex, but also more likely to be understandable and correct.
Wedge both sound like a better option to me for the reason you said
Expected behavior
EventLoopTest.testShuttingDownFailsRegistration
should be stable and not randomly fail.Actual behavior
from here: https://ci.swiftnio.io:4433/job/swift-nio-master/195/console
Steps to reproduce
dunno, run test in a loop on Linux?
If possible, minimal yet complete reproducer code (or URL to code)
test should be fairly minimal
SwiftNIO version/commit hash
9360bd5c4940f61f301e71497606ec853efa2a8d
Swift & OS version (output of
swift --version && uname -a
)CI, Swift 4.0.3