JustinTulloss / zeromq.node

Node.js bindings to the zeromq library
MIT License
1.65k stars 284 forks source link

Issue with 'remove receive message delayed handling' commit #344

Open prdn opened 10 years ago

prdn commented 10 years ago

Hi,

regarding the issue #338

this commit is causing the issue below: Assertion failed: (!handle.IsEmpty()), function Unwrap, file ../src/node_object_wrap.h, line 60 It happens when I'm pushing a lot of messages in the queue (1000 or more).

I have tested it on OSX and Linux and the result is the same.

Anyone has the same problem? I think it should be reverted to the original.

Thank you Paolo

kkoopa commented 10 years ago

Don't see such a problem when running the performance tests. What socket type are you using? How can the error be reproduced?

prdn commented 10 years ago

It is a 'dealer' socket and it is defined as follows:

this.socket = zmq.socket('dealer');
this.socket.identity = new Buffer(this.name);
this.socket.setsockopt('linger', 0);

To reproduce try this issue you could test my project https://github.com/prdn/zmq-omdp (an extended mdp protocol implementation):

cd ~
mkdir zzz
cd zzz
git clone https://github.com/prdn/zmq-omdp.git
cd zmq-omdp
npm install zmq
npm install zmq debug readable-stream
cd examples
node perf.js

In perf.js you'll find 100000 writes in a loop. This causes the issue. Reverting changes of the commit #338 it works correctly.

Let me know if you need more details.

kkoopa commented 10 years ago

Cannot reproduce with Node 0.12. Maybe this only affects lesser versions.

prdn commented 10 years ago

I'm using Node 0.10.

talss89 commented 10 years ago

We're experiencing this issue in 0.10 too. Req/Rep 85,000 messages in quick succession. Going to try 0.12...

prdn commented 10 years ago

It should work with 0.12 but I cannot rely on it for my project.

talss89 commented 10 years ago

Yeah, we can't rely on 0.12 either until it's stable (and some of our deps don't compile on 0.12).

The temporary fix we've come up with is introducing a 50ms delay between sending messages (could be lower maybe - 50ms was a first guess and I'm having a very stressful day). For our application the latency isn't a problem.

@prdn - It might be worth experimenting with 1/2ms delays?

prdn commented 10 years ago

@talss89 I prefer to revert the change in node_modules/zmq/lib/index.js (around line 500)

do {
    emitArgs.push(this._zmq.recv());
} while (this._receiveMore);

var self = this;
// Handle received message immediately for receive only sockets to prevent memory leak in driver
if (types[this.type] === zmq.ZMQ_SUB || types[this.type] === zmq.ZMQ_PULL) {
    self.emit.apply(self, emitArgs);
} else {
    // Allows flush to complete before handling received messages.
    (function(emitArgs) {
     process.nextTick(function(){
             self.emit.apply(self, emitArgs);
             });
     })(emitArgs);
}

if (zmq.STATE_READY != this._zmq.state) {
           break;
}
ronkorving commented 10 years ago

Perhaps the flush loop shouldn't try to "send all, then retrieve all and emit", but rather interleave the two, so it becomes more a "send some, retrieve some and emit, send some more, retrieve some more and emit" kind of loop. Would that solve the issue?

ronkorving commented 10 years ago

This could easily be tested I think by removing the && this._outgoing.length condition from line https://github.com/JustinTulloss/zeromq.node/blob/master/lib/index.js#L519

prdn commented 10 years ago

@ronkorving In my scenario there is only a huge send (about 100000 outgoing messages) and 1 single incoming message. So I think that switching from send to receive wouldn't solve the issue.

ronkorving commented 10 years ago

I would be very interested in analyzing this, but will need the smallest code sample you can come up with to do so.

mneil commented 10 years ago

I was having the same issue @prdn with dealer sockets on 0.10.29. We also cannot upgrade past 0.10 yet. His suggestion for flushing the outgoing message before receiving input worked perfectly. In my case I'm making 100,000 calls and receiving 100,000 replies back.

Before the change, send 500 requests at an average 75.6ms After the change, send 500 requests at an average 86.8

Not enough perf tests to tell the real impact. I think the change is negligible overall.

This is the smallest example I could come up with. Testing with mocha/should. I know part of the issue is the massive loop for testing, which, in the real world, wouldn't happen this way. But, using the dealer/router as a broker with dozens of clients/workers sending messages constantly I was concerned with how many calls I could proxy. And how the buffer would be handled.

it('must work', function(done){
      var total = 100000,
        recv = 0;

      this.timeout(total*2);
      var dealer = zmq.socket('dealer');
      var router = zmq.socket('router');

      dealer.connect('tcp://127.0.0.1:3400')
        .on('message', response);
      router.bindSync('tcp://127.0.0.1:3400')
        .on('message', request);

      function request(){

        var requester = arguments[0];
        var id = arguments[1].toString();
        var envelop = arguments[2].toString();

        //do work on msg
        //var msg = arguments[3];

        router.send([requester, id,envelop, JSON.stringify([0, 1])]);
      }

      function response(){

        if( ++recv === total )
          done();
      }

      for( var i = 0; i < total; i++ ) {
        dealer.send([
          Math.random(), 'route',
          JSON.stringify({prop: 'val'})
        ]);
      }
    });

You should get

../src/node_object_wrap.h:60: static T* node::ObjectWrap::Unwrap(v8::Handle<v8::Object>)
       [with T = node::Buffer]: Assertion `!handle.IsEmpty()' failed.

Using zeromq-4.0.4 Ubuntu 14.04 2gb ram

sbisbee commented 9 years ago

I have run into this memory leak as well and am also locked into node 0.10 for now. It's probably worth noting that we saw this issue even with 1,000s of messages, not 10,000s.

I patched our production system to use this code block from an older version, modified to properly use this instead of self.

(function(emitArgs) {
  setImmediate(function() {
    this.emit.apply(this, emitArgs);
  }.bind(this));
}.bind(this))(emitArgs);

It looks like this code was introduced in 2.5.1 and continued to live in 2.6.0, but was changed in 2.7.0.

For those still running node 0.10 with this library I'd recommend pinning your dependency to 2.6.0. Our process is staying at <5% memory now instead of leaking and OOM'ing within 5 minutes, with this being the only change.

Confirmed Broken: >=2.7.0 Confirmed Works: 2.5.1, 2.6.0 Node: 0.10.33

reqshark commented 9 years ago

hey @sbisbee, setImmediate is slower than process.nextTick()

anyway is this still a pending issue for people? what about when using iojs?

sbisbee commented 9 years ago

@reqshark It is for us. We have pinned all of our code to 2.6.0, which has proven to be far more stable per my previous comment, until we are able to migrate to node 0.12. iojs is really not an option.

Appreciate the code input. I'd rather not run a fork of someone else's lib for setImmediate() vs process.nextTick() though.

Thanks for the reply!