Open prdn opened 10 years ago
Don't see such a problem when running the performance tests. What socket type are you using? How can the error be reproduced?
It is a 'dealer' socket and it is defined as follows:
this.socket = zmq.socket('dealer');
this.socket.identity = new Buffer(this.name);
this.socket.setsockopt('linger', 0);
To reproduce try this issue you could test my project https://github.com/prdn/zmq-omdp (an extended mdp protocol implementation):
cd ~
mkdir zzz
cd zzz
git clone https://github.com/prdn/zmq-omdp.git
cd zmq-omdp
npm install zmq
npm install zmq debug readable-stream
cd examples
node perf.js
In perf.js
you'll find 100000 writes in a loop. This causes the issue. Reverting changes of the commit #338 it works correctly.
Let me know if you need more details.
Cannot reproduce with Node 0.12. Maybe this only affects lesser versions.
I'm using Node 0.10.
We're experiencing this issue in 0.10 too. Req/Rep 85,000 messages in quick succession. Going to try 0.12...
It should work with 0.12 but I cannot rely on it for my project.
Yeah, we can't rely on 0.12 either until it's stable (and some of our deps don't compile on 0.12).
The temporary fix we've come up with is introducing a 50ms delay between sending messages (could be lower maybe - 50ms was a first guess and I'm having a very stressful day). For our application the latency isn't a problem.
@prdn - It might be worth experimenting with 1/2ms delays?
@talss89 I prefer to revert the change in node_modules/zmq/lib/index.js (around line 500)
do {
emitArgs.push(this._zmq.recv());
} while (this._receiveMore);
var self = this;
// Handle received message immediately for receive only sockets to prevent memory leak in driver
if (types[this.type] === zmq.ZMQ_SUB || types[this.type] === zmq.ZMQ_PULL) {
self.emit.apply(self, emitArgs);
} else {
// Allows flush to complete before handling received messages.
(function(emitArgs) {
process.nextTick(function(){
self.emit.apply(self, emitArgs);
});
})(emitArgs);
}
if (zmq.STATE_READY != this._zmq.state) {
break;
}
Perhaps the flush loop shouldn't try to "send all, then retrieve all and emit", but rather interleave the two, so it becomes more a "send some, retrieve some and emit, send some more, retrieve some more and emit" kind of loop. Would that solve the issue?
This could easily be tested I think by removing the && this._outgoing.length
condition from line https://github.com/JustinTulloss/zeromq.node/blob/master/lib/index.js#L519
@ronkorving In my scenario there is only a huge send (about 100000 outgoing messages) and 1 single incoming message. So I think that switching from send to receive wouldn't solve the issue.
I would be very interested in analyzing this, but will need the smallest code sample you can come up with to do so.
I was having the same issue @prdn with dealer sockets on 0.10.29. We also cannot upgrade past 0.10 yet. His suggestion for flushing the outgoing message before receiving input worked perfectly. In my case I'm making 100,000 calls and receiving 100,000 replies back.
Before the change, send 500 requests at an average 75.6ms After the change, send 500 requests at an average 86.8
Not enough perf tests to tell the real impact. I think the change is negligible overall.
This is the smallest example I could come up with. Testing with mocha/should. I know part of the issue is the massive loop for testing, which, in the real world, wouldn't happen this way. But, using the dealer/router as a broker with dozens of clients/workers sending messages constantly I was concerned with how many calls I could proxy. And how the buffer would be handled.
it('must work', function(done){
var total = 100000,
recv = 0;
this.timeout(total*2);
var dealer = zmq.socket('dealer');
var router = zmq.socket('router');
dealer.connect('tcp://127.0.0.1:3400')
.on('message', response);
router.bindSync('tcp://127.0.0.1:3400')
.on('message', request);
function request(){
var requester = arguments[0];
var id = arguments[1].toString();
var envelop = arguments[2].toString();
//do work on msg
//var msg = arguments[3];
router.send([requester, id,envelop, JSON.stringify([0, 1])]);
}
function response(){
if( ++recv === total )
done();
}
for( var i = 0; i < total; i++ ) {
dealer.send([
Math.random(), 'route',
JSON.stringify({prop: 'val'})
]);
}
});
You should get
../src/node_object_wrap.h:60: static T* node::ObjectWrap::Unwrap(v8::Handle<v8::Object>)
[with T = node::Buffer]: Assertion `!handle.IsEmpty()' failed.
Using zeromq-4.0.4 Ubuntu 14.04 2gb ram
I have run into this memory leak as well and am also locked into node 0.10 for now. It's probably worth noting that we saw this issue even with 1,000s of messages, not 10,000s.
I patched our production system to use this code block from an older version, modified to properly use this
instead of self
.
(function(emitArgs) {
setImmediate(function() {
this.emit.apply(this, emitArgs);
}.bind(this));
}.bind(this))(emitArgs);
It looks like this code was introduced in 2.5.1 and continued to live in 2.6.0, but was changed in 2.7.0.
For those still running node 0.10 with this library I'd recommend pinning your dependency to 2.6.0. Our process is staying at <5% memory now instead of leaking and OOM'ing within 5 minutes, with this being the only change.
Confirmed Broken: >=2.7.0 Confirmed Works: 2.5.1, 2.6.0 Node: 0.10.33
hey @sbisbee, setImmediate
is slower than process.nextTick()
anyway is this still a pending issue for people? what about when using iojs?
@reqshark It is for us. We have pinned all of our code to 2.6.0, which has proven to be far more stable per my previous comment, until we are able to migrate to node 0.12. iojs is really not an option.
Appreciate the code input. I'd rather not run a fork of someone else's lib for setImmediate()
vs process.nextTick()
though.
Thanks for the reply!
Hi,
regarding the issue #338
this commit is causing the issue below: Assertion failed: (!handle.IsEmpty()), function Unwrap, file ../src/node_object_wrap.h, line 60 It happens when I'm pushing a lot of messages in the queue (1000 or more).
I have tested it on OSX and Linux and the result is the same.
Anyone has the same problem? I think it should be reverted to the original.
Thank you Paolo