Open jmparra opened 9 years ago
During this test, how much CPU do these 2 processes consume? I wonder if Node is mostly idle or not. If not, it would be great to see some flame graphs.
I wonder if we ported the flush method to pure C++, how much we would gain. Inside that function we jump between native and JS land a lot. Those jumps are costly. And this function is where we really spend all of our time.
i enjoy looking at throughput. It is a nice average with substantial differences across the size of our message envelopes.
I ran the test again, CPU usage is 20-70% in both cases, take advantage of drawing a graph flame to stack (with NodeJS v11.14): http://jsfiddle.net/6k0p2yry/
@ronkorving you're absolutely right, the flush method spends a lot time...
It would be a great experiment to port _flush to C++. If anyone's up for it..? :)
@jmparra, I think we can also explain why this is happening if we look more directly at Python.
First of all, python is a much slower language than Node.js. But, similar to Lua, Python can define and use C/C++ library ABIs in a native way, providing direct function access without an additional binary component. I think officially it's a Foreign Function Interface
Python does this when you import ctypes
, which I believe, given a small enough codebase, may completely bypass latencies experienced across other areas of the running interpreter. I've found this to be true, here for example: https://github.com/reqshark/sublimezmq/blob/master/zmq.py
Hence my suggestion for porting flush. It would do away with most of the bad parts of JS/native bindings.
+1, there's a lot of context switching there between JS and the binding
also if we rewrote it as poll and select instead it could really improve perf there http://linux.die.net/man/2/select
libuv (node.js internals) can accomplish this with the uv_poll_t
handle http://docs.libuv.org/en/latest/poll.html
but we would have to include the uv.h
header, which shouldn't be an issue since we're running node anyway...
+1 for libuv, i think is better choice because focus on asynchronous I/O, but I don't know how much improvement provides (if it offers better performance than select)
On my computer I get: Python: 50k requests/sec Node: 10k requests/sec
I think that the test should be fixed a little bit Here you use a setImmediate to push messages but I suggest a simple while(1)
Doing this change on my computer I get Nodejs: 20k requests/sec
However Python is still faster so we have to deal with it.
@ronkorving is right. We should move _flush to C++ If we move the flush loop and the emitter to C++ the perf benefits could be very interesting. Can we use a solution like this one ?
Do we even need a native event emitter? There are only 2 callbacks that need calling. One for passing along errors, and one for passing along received messages. Those could be fixed, which in turn emit, right? (I'm not expert at all in native Node land however, so maybe I'm taking nonsense?)
I see that the _flush method emits an event for received messages. So moving the _flush function to C++ we should also use a C++ event emitter. Am I missing some point?
I think also that we should split _flush in two methods: _send and _recv The send method should call only send. I'm performing a couple of tests and there should be an interesting performance increase. By the way if we move that code to C++ should be enough.
Thanks for the ongoing work. I'm really looking forward to the results :)
Thanks all for working on this issue. What is the state of the current solution for this issue ?
+1 for the update on this issue
To be honest, no work has been done on moving _flush to native land. However, I still strongly believe it's the right thing to do.
I wonder if it's worth rethinking flush a bit and just keeping the recv
and send
completely separate.
I feel like the complexity got multiplied by coupling these operations through flush
oh wait @prdn said that like a year ago
I think also that we should split _flush in two methods: _send and _recv The send method should call only send.
@reqshark @ronkorving please give a try to this PR https://github.com/JustinTulloss/zeromq.node/pull/492
I developed a simple test with Node.JS and Python (pymq) attached image and repository with the code and if there might be a bug in the implementation.
To my surprise Node.JS performance is very poor compared to Python
Any suggestions as to why this is happening?
https://github.com/jmparra/two-queues/tree/zeroqm-nodejs
note: one client = one core.