JustinTulloss / zeromq.node

Node.js bindings to the zeromq library
MIT License
1.65k stars 286 forks source link

Need to explicitly call _flushReads to receive messages #523

Closed messa closed 8 years ago

messa commented 8 years ago

I have found that sometimes the messages are received by socket.on('message', callback) only after I explicitly call socket._flushReads(). It looks like this problem occurs when two or more messages are sent quickly on that socket before.

I have made a (minimal) demonstration of this problem: https://gist.github.com/messa/862638ab44ca65f712fe4d6ef79aeb67

This problem appears also sometimes in our node app suddenly after other messages (responses) were received successfully before so it is apparently not caused by socket possibly still initializing or not being connected yet or similar.

corn3lius commented 8 years ago

+1

I was experiencing a similar issue in the project I am working on. Had to revert to 2.14 for the time being.

quinndiggity commented 8 years ago

+1

Same here, 2.15.0 is completely unusable for us, reverted and pinned 2.14.0

diwu1989 commented 8 years ago

hitting same problem, please fix and do a new release asap our CI pipeline got hosed yesterday by this

RavivIsraeli commented 8 years ago

+1 our production got hosed yesterday by this...

jreeme commented 8 years ago

Yep, spent a few cycles on this. Moving back to 2.13.

diwu1989 commented 8 years ago

is it this commit that introduced the bug? https://github.com/JustinTulloss/zeromq.node/commit/4d161401227d7e78b524e92e09a1b72a5ffa8e2a

looks like a lot of changes to how the socket is read from

andresmeidla commented 8 years ago

2.15.0 does not behave well for me either: consecutive req-socket writes did not go through, except the first one.

rgbkrk commented 8 years ago

Sadly the maintainer is on vacation after doing an extended refactoring. We're not likely to see any merges of patches we provide to fix this in a short amount of time. It's worth investigating though.

pjanuario commented 8 years ago

same here, we use zmq to create a microservices using http://micro-toolkit.github.io/info and we started to have problems were some services started to be unresponsive, since this is minor version and don't have lockdown the new services were using 2.15 and this behaviour started to appear.

going to lock the libraries to use the stable 2.14 for the time being.

ronkorving commented 8 years ago

Hi guys. I was indeed off for 3 weeks. Sad to have to come back to this, apologies for the massive inconvenience. Does anyone have a patch for this yet by any chance?

ronkorving commented 8 years ago

cc @reqshark @kkoopa

ronkorving commented 8 years ago

Could you guys please all share which version of ZMQ and Node.js you are using?

corn3lius commented 8 years ago

OSX: nodejs : 4.2.4 zmq : 4.1.3

quinndiggity commented 8 years ago

zmq 4.1.4 node 4.4.4

ronkorving commented 8 years ago

@messa can you check if the onReady callback is called in the cases where it doesn't work (and your demo ends up depending on your setInterval to pull out the messages). Ie, can you add a console.log to https://github.com/JustinTulloss/zeromq.node/blob/115585463895e95fdd9ddc37a51841d113b0f5fa/lib/index.js#L296 and also use it to print the values of these 2 booleans? Thanks.

ronkorving commented 8 years ago

Alright, by creating 2 separate processes I've been able to reproduce the issue. I also noticed that in the case of the missing messages, onReady is not called on the dealer end (which should receive the echoed message coming back from the router server).

Next, I've noticed that the UV_PollCallback is not invoked on the dealer end. The Ref/Unref API addition seems to have no impact, since when I commented out the body of the AttachToEventLoop and DetachFromEventLoop, it made no difference.

Confirmed that the problem was introduced by commit 4d161401227d7e78b524e92e09a1b72a5ffa8e2a

ronkorving commented 8 years ago

It seems the problem is caused by Sendv sending a number of message batches in the same tick, while a response is received in between. It appears that reading ZMQ_EVENTS as part of the send process in Sendv will prevent the uv poller from notifying us that a message can be read. I think whenever we test for ZMQ_EVENTS we need to invoke onReady for reading immediately if the ZMQ_POLLIN flag is set.

ronkorving commented 8 years ago

I think the solution will be to have a separate onReady for reads and for writes, and call them whenever appropriate. That means not just from the poller, but calling onReadReady from Send when ZMQ_EVENTS indicates ZMQ_POLLIN for example.