area515 / Photonic3D

Control software for resin 3D printers
http://photonic3d.com
GNU General Public License v3.0
131 stars 115 forks source link

Intermittent Test Failures: org.area515.resinprinter.security.keystore.RendezvousExchange.messageExchange #246

Closed jmkao closed 7 years ago

jmkao commented 7 years ago

CI builds seem to be intermittently failing in Travis:

    java.lang.IllegalStateException: Blocking message pending 10000 for BLOCKING
        at org.eclipse.jetty.websocket.common.WebSocketRemoteEndpoint.lockMsg(WebSocketRemoteEndpoint.java:130)
        at org.eclipse.jetty.websocket.common.WebSocketRemoteEndpoint.sendBytes(WebSocketRemoteEndpoint.java:244)
        at org.area515.resinprinter.security.keystore.IncomingHttpTunnel.sendMessage(IncomingHttpTunnel.java:144)
        at org.area515.resinprinter.security.keystore.RendezvousClient.sendRequestToRemote(RendezvousClient.java:215)
        at org.area515.resinprinter.security.keystore.RendezvousExchange.messageExchange(RendezvousExchange.java:115)

Now this is a weird one because I can't reproduce it on my desktop Linux box at all, but in Travis is seems to occur about 20% of the time.

http://stackoverflow.com/questions/26264508/websocket-async-send-can-result-in-blocked-send-once-queue-filled

However, I'm not sure whether the socket being tested is sync or async.

WesGilster commented 7 years ago

I have this documented as a TODO, but opening a bug is much better. I was hoping I wouldn't have to fix this until I started working on scalability testing, but 20% is way more often than I thought.

WesGilster commented 7 years ago

This should be fixed now. I wrote a test that was able to reproduce the problem about 70% of the time as it seems it's related to the performance of network IO on the executing platform. The fix for this was simple, but the test that I wrote to reproduce the problem uncovered a much worse problem. I was reusing ivs under highly concurrent situations, and that's a security no no. My Crypto did detect the problem quite nicely which I was happy with.

I wasn't quite ready to start scalability testing, but it's nice to know 15 threads working through 99 http requests work great. That's a pretty good start even if the load is direct from a browser.

jmkao commented 7 years ago

Looks good, I"ve had a significant number of tests pass without issue.

WesGilster commented 7 years ago

Great, I'll close.