Closed sznowicki closed 4 years ago
Hi, thanks for reporting this, I will take a look into that this weekend.
There were some version bumps on dependencies and the addon itself is now compiled statically. So there were many changes that could add odd behavior.
If possible, post the output of calling this:
console.log(Curl.getVersionInfoString())
console.log(Curl.VERSION_NUM)
Different machine so I post all details: macOS 10.14.4 node v10.10.0 console logs:
Protocols: dict, file, ftp, ftps, gopher, http, https, imap, imaps, ldap, ldaps, pop3, pop3s, rtsp, scp, sftp, smb, smbs, smtp, smtps, telnet, tftp
Features: AsynchDNS, IDN, IPv6, Largefile, NTLM, NTLM_WB, SSL, libz, brotli, TLS-SRP, HTTP2, UnixSockets, HTTPS-proxy
I can confirm that I see the exact same behaviour running Jest tests.
@sznowicki if possible also run Curl.getVersion()
, I missed this one. It will give the version of libcurl and their dependencies.
Sure, sorry for a delay.
Protocols: dict, file, ftp, ftps, gopher, http, https, imap, imaps, ldap, ldaps, pop3, pop3s, rtsp, scp, sftp, smb, smbs, smtp, smtps, telnet, tftp
Features: AsynchDNS, IDN, IPv6, Largefile, NTLM, NTLM_WB, SSL, libz, brotli, TLS-SRP, HTTP2, UnixSockets, HTTPS-proxy
475137
libcurl/7.64.1 OpenSSL/1.1.1b zlib/1.2.11 brotli/1.0.7 libidn2/2.1.1 libssh2/1.8.2 nghttp2/1.34.0```
There is something weird going on for sure, I just tested the file you posted and it completes after 8s if you increase mocha's timeout.
The strange is that the on end
/ error
callback is called really fast, the delay is between the resolve()
/ reject()
and the then
/ catch
being triggered.
Interesting enough, this only affects the tests, there is no slowdown running the code directly.
I will try to find some time to profile it this week. The workaround, for now, is to increase the timeout for that specific test. You can do that by calling
this.timeout(10000)
Inside the describe.
thanks @JCMais for taking this issue seriously.
I'd say increasing timeout might work but since those kind of issues are sometimes symptoms of some serious stuff like overriding global variables I'll hold the upgrade on my end.
I've already checked out the code on my machine and when I have some more time I'd also try to look on this. Maybe I'll find something out.
I noticed this too! And this is happening outside of Mocha, on Windows 10, Node 12.5.0 within ts-node. My code is very similar to the test case posted above (wrapping libcurl with a promise), and the "end" event will be emitted, and the promise would not resolve until a second or two later.
Curiously, adding an event listener for data that console.logs the chunk "fixes" the issue:
curl.on("data", chunk => {
console.log(chunk);
});
Doing anything other than console.logging the chunk causes the "fix" to not work, including looping over the buffer and printing the sum of its bytes, console.log(chunk.toString())
, or even console.log(util.inspect(chunk))
. It must be console.log(chunk)
, console.error(chunk)
, or some function that eventually passes the chunk directly into console.log/console.error. π€
I spoke too soon - the console.log "fix" only works when it is running in the WebStorm debugger. I suspect there is a really subtle issue with the way node-libcurl is interacting with the event loop.
This is turning out to be very messy to debug - placing breakpoints within the end handler causes the issue to go away.
In the same program, there is a ReconnectingWebSocket. Commenting out the instantiation of ReconnectingWebSocket fixes the issue.
With the ReconnectingWebSocket instantiated with a URL that does not have a server running, it looks like there is something blocking the event loop:
curl.on("end", (statusCode, body) => {
console.log("END", Date.now());
curl.close();
console.log("CLOSE", Date.now());
process.nextTick(() => {
console.log("TICK", Date.now());
resolve({
statusCode,
body: body as string,
// headers
});
})
})
END 1561930626028
CLOSE 1561930626030
TICK 1561930628715
RESOLVE 1561930628715
Replacing the ReconnectingWebSocket with a normal ws
WebSocket:
END 1561930740436
CLOSE 1561930740437
TICK 1561930741973
Error: connect ECONNREFUSED 127.0.0.1:3000
at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1056:14)
With a real WebSocket server running:
Connected to WebSocket server
END 1561930916999
CLOSE 1561930917000
TICK 1561930920992
RESOLVE 1561930920992
1561930920992
END 1561930921603
CLOSE 1561930921603
TICK 1561930922824
RESOLVE 1561930922824
1561930922824
When I drop setInterval(() => {}, 10);
into the file, the code works as expected:
Connected to WebSocket server
END 1561931032224
CLOSE 1561931032224
TICK 1561931032230
RESOLVE 1561931032230
1561931032230
END 1561931032534
CLOSE 1561931032534
TICK 1561931032540
RESOLVE 1561931032540
1561931032540
And with an interval of 500 ms:
Connected to WebSocket server
END 1561931103926
CLOSE 1561931103927
TICK 1561931104332
RESOLVE 1561931104332
1561931104332
END 1561931104490
CLOSE 1561931104490
TICK 1561931104832
RESOLVE 1561931104832
1561931104832
So the issue is that the event loop simply isn't ticking properly under certain conditions.
Wrapping the resolve in setImmediate
actually fixes the issue:
curl.on("end", (statusCode, body) => {
console.log("END", Date.now());
curl.close();
console.log("CLOSE", Date.now());
setImmediate(() => {
console.log("TICK", Date.now());
resolve({
statusCode,
body: body as string,
// headers
});
})
});
END 1561931280096
CLOSE 1561931280097
TICK 1561931280097
RESOLVE 1561931280098
1561931280098
END 1561931280271
CLOSE 1561931280271
TICK 1561931280272
RESOLVE 1561931280272
1561931280272
Explanation of process.nextTick
vs setImmediate
: https://nodejs.org/en/docs/guides/event-loop-timers-and-nexttick/
For those following along by email: the comment above has been updated with a band-aid solution to this bug.
tl;dr: wrap your resolve
call in setImmediate
.
Thanks @as-com for additional research. I'll still wait though until we know what's really the cause of this issue since it seems to be very mysterious.
@as-com this is awesome and shows we have a true Heisenbug!
@as-com thanks for investing some time to debug this issue.
I still have not found time to look into it myself, sorry.
For anyone trying to debug this more, something that would be interesting to try is to check if there is any slowdown on the native side. A really basic way to do that is just printing something from those two functions: https://github.com/JCMais/node-libcurl/blob/8eb18a8a1029a46f281801fb9b88d2bb751264de/src/Multi.cc#L141-L177 https://github.com/JCMais/node-libcurl/blob/8eb18a8a1029a46f281801fb9b88d2bb751264de/src/Multi.cc#L180-L199
And from this one on the JS side: https://github.com/JCMais/node-libcurl/blob/8eb18a8a1029a46f281801fb9b88d2bb751264de/lib/Curl.ts#L68-L83
The log inside the onMessage
above should immediately follow the call to the OnSocket
above.
Did what you suggested. Here's the code: https://github.com/JCMais/node-libcurl/compare/develop...sznowicki:177-debug?expand=1
I'm not sure how to analyse those but here's the console output:
szymon@MacBook-Air-Szymon uptime-check (node-libcurl-2) $ npm t
> uptime-check@2.0.0 test /Users/szymon/dev/uptime-check
> mocha --recursive
Promisified curl call test
Multi::OnSocket started
Multi will processMessages
Multi::OnSocket started
Multi will processMessages
Multi::OnSocket started
Multi will processMessages
Multi::OnSocket started
Multi will processMessages
Multi::OnSocket started
Multi will processMessages
Multi::OnSocket started
Multi will processMessages
Multi::OnSocket started
Multi will processMessages
multiHandle.onMessage starts!
PURE DONE
β should success, pure curl (438ms)
Multi::OnSocket started
Multi will processMessages
Multi::OnSocket started
Multi will processMessages
multiHandle.onMessage starts!
END END END, now resolve
1) should resolve
TEST DONE
Multi::OnSocket started
Multi will processMessages
multiHandle.onMessage starts!
END END END, now resolve
2) should resolve (async/await way)
ASYNC/AWAIT DONE
1 passing (4s)
2 failing
1) Promisified curl call test
should resolve:
Error: Timeout of 2000ms exceeded. For async tests and hooks, ensure "done()" is called; if returning a Promise, ensure it resolves. (/Users/szymon/dev/uptime-check/test/node-libcurl-issue-test.js)
2) Promisified curl call test
should resolve (async/await way):
Error: Timeout of 2000ms exceeded. For async tests and hooks, ensure "done()" is called; if returning a Promise, ensure it resolves. (/Users/szymon/dev/uptime-check/test/node-libcurl-issue-test.js)
npm ERR! Test failed. See above for more details.
I'm checking it again and what's interesting is that changing my test code like this:
curl.on('end', () => {
console.log('END END END, now resolve');
setTimeout(() => resolve(), 0);
// resolve()
});
Makes the tests green again. Which is really interesting.
Know what is even more interesting? You don't need to wrap the resolve/reject in a setTimeout
at all, just this will make it resolve instantly:
curl.on('end', () => {
console.log('END END END, now resolve');
resolve()
setTimeout(() => {}, 0) // or setImmediate(() => {})
});
π€
I am starting to think this is a really weird async iteration with Mocha.
Alright, I am pretty sure this is a Mocha issue, I just created a test case using Jest and it worked correctly, and running the code directly does not show any slowdown.
I've opened an issue on the Mocha repo
I don't think this is specific to Mocha, it happens when there is a ws
WebSocket client in the same process. There must be some weird thing going on with the way node-libcurl is interacting with the event loop.
@as-com can you post some code that reproduces it without using mocha?
@JCMais Turns out no WebSocket is even required. Here is some code that reproduces the issue:
import {Curl, CurlFeature} from "node-libcurl";
interface CurlHttpResponse {
statusCode: number,
body: string,
// headers: Buffer | HeaderInfo[]
}
setTimeout(() => {}, 5000);
const defaultCurl = new Curl();
defaultCurl.setOpt(Curl.option.SSL_VERIFYPEER, false);
function httpRequestComplicated(url: string): Promise<CurlHttpResponse> {
return new Promise((resolve, reject) => {
const curl = defaultCurl.dupHandle(false);
curl.setOpt(Curl.option.URL, url);
curl.enable(CurlFeature.NoHeaderStorage);
curl.enable(CurlFeature.NoHeaderParsing);
curl.on("error", (error, errorCode) => {
curl.close();
// @ts-ignore
error.errorCode = errorCode;
reject(error);
});
curl.on("end", (statusCode, body) => {
console.log("END", Date.now());
curl.close();
resolve({
statusCode,
body: body as string,
// headers
});
// console.log("CLOSE", Date.now());
// setImmediate(() => {
// // console.log("TICK", Date.now());
//
// })
});
curl.perform();
});
}
console.log("START", Date.now());
httpRequestComplicated("https://icanhazip.com").then(x => {
console.log(x, Date.now())
}).catch(console.error);
(sorry about the mess, I just quickly ripped it out of another codebase)
Thanks @as-com, looks like the close method is in some way blocking the event loop, but the call to close
itself does not cause any delay:
START 2019-07-28T18:21:58.241Z
END - before close 2019-07-28T18:21:58.316Z
END - after close 2019-07-28T18:21:58.318Z
RESOLVED 2019-07-28T18:22:03.241Z
Wrapping the call to curl.close()
in a setImmediate
makes the promise resolve without any delay. Commenting the setTimeout
you added makes the problem disappear, and even more weird looks like the promise is only resolved after the timeout finishes.
The original issue on the question is still weird, because it was not calling close
at all, and seems to run correctly on Jest and directly. π€
I will try to investigate it further
Edit: There is no need to wrap curl.close()
in a setImmediate
, even just calling setImmediate(() => {})
will make it resolve faster
I'm nowhere near to fully understand what the C code does here, but since it's a nice challenge I'd ask:
What this thing does in Multi::Dispose
?
uv_timer_stop(this->timeout.get());
This stops the libuv
timer so it does not call their callback anymore. It's not safe to let it call the callback after the Multi instance is disposed of because the memory pointed by the data passed to it will not be valid anymore, see:
This is all based on the libcurl usage in an event-based way, see the following links for more details:
https://curl.haxx.se/libcurl/c/libcurl-multi.html (MULTI_SOCKET
section)
https://curl.haxx.se/libcurl/c/curl_multi_socket_action.html
@as-com did you maybe find something? Is this issue causing you any troubles on production?
@sznowicki I am continuing to use the above stated workaround in production.
I've tried again, with no luck, to debug this issue. Opened an issue on nodejs/help
just in case someone else has a better idea on what might be causing this: https://github.com/nodejs/help/issues/2586
Well written question there. Thanks a lot for keeping track of this issue. Highly appreciated.
296 days later this has been finally fixed ππ₯³, for details see the issue above.
The fix is available on the prerelease version 2.1.0-4
, which can be installed with:
yarn add node-libcurl@next
If everything works correctly, I will release the 2.1.0
version at the start of next week, I will close this issue when I do that.
@JCMais this is awesome. We really want to move to node-libcurl at https://checklyhq.com but this was a big blocker
@JCMais big warm thank you for getting this done. Now it's time for me to finally migrate to @2 with my uptime-checker module.
Thank you!
Sorry if this is not a right place but I'm looping with a debugger for hours already and I think I am out of ideas of what can be wrong.
tldr; testing with mocha is highly unreliable, usually times out
Longer story
I'm using this package for a long time and first I'd like to thank all the contributors for amazing work you do. Thank you.
Today I decided to update my small library by bumping
node-libcurl
to latest version.I have small regression/integration test and I found most of them (sometimes all) fail after upgrade with no good reason.
I spent hours trying to detect the issue in my code but I'm pretty sure there is some bug in this code.
I've made a smallest as possible test to prove that: https://github.com/sznowicki/uptime-check/blob/node-libcurl-2/test/node-libcurl-issue-test.js
In this file you'll find three test cases: curl pure implementation, curl wrapped in a promise and called in traditional way, same wrapped curl called in async/await way.
While pure implementation always succeeds (also tried placing it last), Promise.then always times out, async/await usually times out but sometimes is surprisingly green (slow though).
To be honest, I don't have a good idea why this package would cause this behaviour from mocha, but I did try many ways of testing that to exclude other possibilities.
It just feels like something either overrides mocha's "done/await" checks.
On way of testing was to resolve early, before Curl is involved and this made the tests finishing again (also included in the test cases I posted). And I think this is a very good reason to post this issue here and not in the mocha repo.
If you have any questions, or concerns I promise to be active in this issue.
If I missed something stupid and obvious, I'd like to apologise in advance.
PS: maybe it matters: node v10.2.1
EDIT: