Closed messa closed 8 years ago
+1
I was experiencing a similar issue in the project I am working on. Had to revert to 2.14 for the time being.
+1
Same here, 2.15.0 is completely unusable for us, reverted and pinned 2.14.0
hitting same problem, please fix and do a new release asap our CI pipeline got hosed yesterday by this
+1 our production got hosed yesterday by this...
Yep, spent a few cycles on this. Moving back to 2.13.
is it this commit that introduced the bug? https://github.com/JustinTulloss/zeromq.node/commit/4d161401227d7e78b524e92e09a1b72a5ffa8e2a
looks like a lot of changes to how the socket is read from
2.15.0 does not behave well for me either: consecutive req-socket writes did not go through, except the first one.
Sadly the maintainer is on vacation after doing an extended refactoring. We're not likely to see any merges of patches we provide to fix this in a short amount of time. It's worth investigating though.
same here, we use zmq to create a microservices using http://micro-toolkit.github.io/info and we started to have problems were some services started to be unresponsive, since this is minor version and don't have lockdown the new services were using 2.15 and this behaviour started to appear.
going to lock the libraries to use the stable 2.14 for the time being.
Hi guys. I was indeed off for 3 weeks. Sad to have to come back to this, apologies for the massive inconvenience. Does anyone have a patch for this yet by any chance?
cc @reqshark @kkoopa
Could you guys please all share which version of ZMQ and Node.js you are using?
OSX: nodejs : 4.2.4 zmq : 4.1.3
zmq 4.1.4 node 4.4.4
@messa can you check if the onReady callback is called in the cases where it doesn't work (and your demo ends up depending on your setInterval to pull out the messages). Ie, can you add a console.log to https://github.com/JustinTulloss/zeromq.node/blob/115585463895e95fdd9ddc37a51841d113b0f5fa/lib/index.js#L296 and also use it to print the values of these 2 booleans? Thanks.
Alright, by creating 2 separate processes I've been able to reproduce the issue. I also noticed that in the case of the missing messages, onReady is not called on the dealer end (which should receive the echoed message coming back from the router server).
Next, I've noticed that the UV_PollCallback is not invoked on the dealer end. The Ref/Unref API addition seems to have no impact, since when I commented out the body of the AttachToEventLoop and DetachFromEventLoop, it made no difference.
Confirmed that the problem was introduced by commit 4d161401227d7e78b524e92e09a1b72a5ffa8e2a
It seems the problem is caused by Sendv
sending a number of message batches in the same tick, while a response is received in between. It appears that reading ZMQ_EVENTS
as part of the send process in Sendv
will prevent the uv poller from notifying us that a message can be read. I think whenever we test for ZMQ_EVENTS
we need to invoke onReady for reading immediately if the ZMQ_POLLIN
flag is set.
I think the solution will be to have a separate onReady for reads and for writes, and call them whenever appropriate. That means not just from the poller, but calling onReadReady from Send when ZMQ_EVENTS indicates ZMQ_POLLIN for example.
I have found that sometimes the messages are received by
socket.on('message', callback)
only after I explicitly callsocket._flushReads()
. It looks like this problem occurs when two or more messages are sent quickly on that socket before.I have made a (minimal) demonstration of this problem: https://gist.github.com/messa/862638ab44ca65f712fe4d6ef79aeb67
This problem appears also sometimes in our node app suddenly after other messages (responses) were received successfully before so it is apparently not caused by socket possibly still initializing or not being connected yet or similar.