Open trevj opened 8 years ago
A few more notes:
The man page has a particularly salient bit under socket options:
SCTP_NODELAY
Turn on/off any Nagle-like algorithm. This means that packets are generally sent
as soon as possible and no unnecessary delays are introduced, at the cost of more
packets in the network. Expects an integer boolean flag.
So, possibly just figuring out how to properly set this while building may fix our issue. Otherwise, trying to copy the fix we did for Chrome into node is another path.
I think Nagle is already being disabled - https://github.com/js-platform/libwebrtc/blob/7848af36d86a4bbe8569693a1169123ad67c8905/talk/media/sctp/sctpdataengine.cc#L373
So, the fix may involve tuning buffer size - when we did that with Chrome, what was the ultimate resolution? Did we push upstream, or do we maintain our own config/build somewhere?
I think I found commits in the official webrtc repo corresponding to Lally's work mentioned in the prior Chrome issue - https://chromium.googlesource.com/external/webrtc/+log/master/?s=3480728
If those are the fixes, then all we need to is get the node-webrtc build process to be based on an adequately new libwebrtc (right now it's ca. 2014). I think it's worth trying to get this into the official node-webrtc, but we can probably figure it out ourselves first by building the library and then editing this - https://github.com/js-platform/node-webrtc/blob/develop/scripts/download-webrtc-libraries-and-headers.js
Possibly more relevant commit - https://chromium.googlesource.com/external/webrtc/+/5c6c6e026bbcc83ad546d00b41ab739dcc856c1b
Current plan of action - try to get node-webrtc building off newer webrtc.
So I think they actually are already building on newer-ish webrtc (the 2014 repo is just stale legacy). https://github.com/markandrus/build-webrtc/blob/43b972651326c2ea3872896cf170320bbd89bd46/config.default.json#L7-L11
May still try doing a custom build, but I'm less confident that it will change anything.
Right, AFAICT we're already running code that includes the fix.
Perhaps node-webrtc doesn't use that exact same code path? The figures are almost exactly match what we used to see in Chrome: https://github.com/uProxy/uproxy-lib/releases/tag/v35.0.0
Suggest we dig deeper into node-webrtc's "wrapping" C code.
FYI, here is the commit that seemed to fix Chrome: https://chromium.googlesource.com/external/webrtc.git/+/e8386d21992b6683b6fadd315ea631875b4256fb
Agreed re: the wrapping. https://github.com/js-platform/node-webrtc/blob/develop/src/peerconnection.cc is where I'm investigating from, and https://github.com/nodejs/nan may also be relevant.
I dug into build-webrtc itself, too.
Since the patch for Chrome essentially added a call to SignalReadyToSend
, this seems relevant:
We might be able to hack a solution at the build-webrtc layer? Need to trace SignalReadyToSend
a bit first.
This also seems relevant: https://chromium.googlesource.com/external/webrtc.git/+/master/webrtc/media/sctp/sctpdataengine.cc#368
Fired on our I/O thread.
- What if there is no I/O thread on Node.js? Some printf
s might help us answer that :-)
And where that callback is registered in usrsctplib
:
https://github.com/sctplab/usrsctp/blob/master/usrsctplib/user_socket.c#L1410
Again, we might even hack the callback at that level, avoid the whole I/O thread thing.
If it's threading related, another possibility is to increase the uv thread pool size - http://stackoverflow.com/questions/22644328/when-is-the-thread-pool-used#22644735
Both https://github.com/js-platform/node-webrtc/blob/develop/src/datachannel.cc and https://github.com/js-platform/node-webrtc/blob/develop/src/peerconnection.cc make liberal use of libuv to deal with async and mutex lock/unlocking. According to the above link, libuv has a default thread pool size of 4, but can be increased e.g. process.env.UV_THREADPOOL_SIZE = 10;
.
Could be worth a shot, I'll read more about libuv in general in case there are other pertinent aspects.
Nice find!
I ran this command:
UV_THREADPOOL_SIZE=25 time node examples/stress.js
I didn't see an improvement but as discussed let's come up with a test where we can add latency between the peers.
@trevj How are you running the chrome and firefox throughput tests?
Also what are your units for the throughputs? When I'm running the stress send and receive as documented in the wiki it's taking ~13 seconds for the program to complete with 18mbps throughput.
From https://github.com/uProxy/uproxy-docker/pull/96:
no obfuscation
0ms latency
chrome,14611 firefox,5781 node,1523
150ms latency
chrome,604 firefox,612 node,225
caesar
0ms latency
chrome,1289 firefox,1772 node,872
150ms latency
chrome,535 firefox,532 node,192
Summary of our work to fix this in Chrome: https://github.com/uProxy/uproxy/issues/1339
At the uProxy layer, this might be relevant too: https://github.com/uProxy/uproxy/issues/1563