Performance problem - Githubissues

JustinTulloss / zeromq.node

Node.js bindings to the zeromq library

MIT License

1.65k stars 284 forks source link

Performance problem #496

Open gtamas opened 8 years ago

gtamas commented 8 years ago

Running make perf gives me this results:

node perf/local_thr.js tcp://127.0.0.1:5556 1 100000& node perf/remote_thr.js tcp://127.0.0.1:5556 1 100000
started receiving
message size: 1 [B]
message count: 100000
mean throughput: 32534 [msg/s]
mean throughput: 0 [Mbit/s]
overall time: 3 secs and 73683107 nanoseconds

Isn't 30K reqs / sec super low by ZMQ standards? I know reaching millions / sec is only possible using the C library directly, but even with NodeJS shouldn't this number be much higher?

I mean, it's running on localhost and not on 2 separate boxes and uses 1KB messages..

Is this related to #373? If it is, is anyone working on improving the performance right now?

I'm using the latest version of the ZMQ binding (2.14.0), Node 5.9.0 and ZeroMQ 4.0.5. I'm on OSX.

ronkorving commented 8 years ago

I get very similar numbers, and they are indeed quite bad. Nobody is currently actively working on performance to my knowledge. The biggest impact change imho we can make is to move a lot of logic to C++ (the _flush method for example), in order to reduce crossing the C++/JS boundary as much as we do right now. I'm quite short on time, but this is definitely something I (and others) would still like to see happen. If you're interested, let me know.

reqshark commented 8 years ago

in my opinion such a change will result in zmq performing better than the nanomsg module

ronkorving commented 8 years ago

Looking a bit deeper into it btw, one major problem we have is that we copy all buffers before sending (to avoid GC issues). This is really detrimental to performance. There must be a better way. Node solves these kinds of issues by retaining a reference through a closure. It seems that zmq_msg_init_data (available since zmq 2) could give us the callback hook we need to keep the reference alive and leave the GC to V8 (instead of actually freeing it then and there). It seems the old OutgoingMessage class in there was designed to do almost that (this class is currently unused). I've moved this idea to #498

Another realization is that our GetSockOpt looks incredibly slow, because of all the searches we do to ensure we use the right type. This seems like a massive waste, and we use getsockopt a lot during a flush. I have no suggestions on how to speed that part of the code up though.

ronkorving commented 8 years ago

I'm actually spending some time on this today. Expect a PR with some benchmark numbers.

ronkorving commented 8 years ago

See #499 which triples the throughput. That's not the end of it though, since buffers are still being copied, so I think we can still win a lot. But consider this step 1.