JCMais / node-libcurl

libcurl bindings for Node.js
https://npmjs.org/package/node-libcurl
MIT License
660 stars 117 forks source link

Curly - Close connection - Timeout Exception #405

Closed jeremydenoun closed 7 months ago

jeremydenoun commented 8 months ago

Describe the bug

Hello everyone,

I encounter an issue with curly (but this should be the same with curl), I try to fetch a big file with curly in stream mode output and await end of the transfer with this kind of call

    const {
        statusCode: _statusCode,
        data: stream,
        headers: _headers,
    } = await curly.get(url, {
        curlyStreamResponse: true,
        curlyResponseBodyParsers: false,
        NOPROGRESS: true,
        FOLLOWLOCATION: true,
        SSL_VERIFYHOST: false,
        SSL_VERIFYPEER: false,
        MAXREDIRS: 5,
        CONNECTTIMEOUT: 20,
        DNS_CACHE_TIMEOUT: 10,
        TIMEOUT: 3600 * 10,
        SERVER_RESPONSE_TIMEOUT: 60 * 30,
        FAILONERROR: true,
        curlyProgressCallback: cbProgress,
    });
    const writeStream = fs.createWriteStream(output)
    writeStream.on('error', function (e) { console.error(e)  });
    await streamPipeline(stream, writeStream);

in a normal condition everything works fine and after streamPipeline everything is ok (or I have an exception in the stream (handled by writeStream.on('error', ...) or catch at parent level with .catch(async (error) => {}) if a normal error happened 404, connect timeout, ...) but in high load condition (on the server side), I have a behavior unwanted.

In high load case, my download finish with success, the file has good size (and is valid) end await callbacks happened without any exception and continue program execution but after a variable period, event loop receive an exception (not handled by any catch define at "curl" level because related object doesn't exist anymore).

with "Timeout error" (28) (isCurlError true)... and this break event loop (because mishandled exception and I'm already in another part of process..) Direct impact stops current process (leave a zombie because spawning a new process on file downloading) when I read node-libcurl source code in stream mode if an exception happened this is forward to stream, but in my case.. Stream is closed so it's maybe why this is emitted globally.

Hopefully, I have a retry process to handle this but leave the zombie process and I can't do anything about it except restart full process.

I try some tips :

For information I have already seen this kind of behavior in libcurl (command line, python binding) and maybe related to https://github.com/curl/curl/issues/3665 but in other languages it's not an issue because exception launching outside current function will be discarded.

So I'd like your help if you have any tips or idea to fix that.

To Reproduce Download a large file in stream mode, keep the connection open and don't close it after content-length size has been sent. Wait a timeout.

Version information:

Version:

I try with 3.0.0 and pre-release 3.0.1 version

Version: libcurl/7.86.0 OpenSSL/3.0.12 zlib/1.2.13.1-motley brotli/1.0.9 zstd/1.4.9 libidn2/2.1.1 libssh2/1.10.0 nghttp2/1.57.0
Protocols: dict, file, ftp, ftps, gopher, gophers, http, https, imap, imaps, ldap, ldaps, mqtt, pop3, pop3s, rtsp, scp, sftp, smb, smbs, smtp, smtps, telnet, tftp
Features: AsynchDNS, Debug, TrackMemory, IDN, IPv6, Largefile, NTLM, NTLM_WB, SSL, libz, brotli, TLS-SRP, HTTP2, UnixSockets, HTTPS-proxy, alt-svc

OS: Node.js Version: 20 / Debian 12.4

Thanks

jeremydenoun commented 7 months ago

I found a workaround, I use a fork to handle node-libcurl request and await child exit, after exit nothing can happened to main event loop. (to avoid any issue, I use process.send / on("message") to transmit request and eventually exception between parent/child)

Please be free to close the ticket, if you don't want investigate more

Thanks

JCMais commented 7 months ago

@jeremydenoun I tried to reproduce the issue but had no luck, are you able to provide a code that reproduces it somehow for you?

Maybe using a test file from some online provider, such as this one: https://ash-speed.hetzner.com/

jeremydenoun commented 7 months ago

I try to reproduce with hamms and mse6 without success too.

I don't find a way to easily simulate a TCP slow close or miss RST/FIN after sent a complete payload (and I only expect this was the issue because this was occasional, only for big transfer and happened only in high network congestion use case, but this can be a re-use, ordering or anything else low level issue)

since my fork-workaround no problem, this work has expected (I suspect it's hidden behind the forest)

I plan to work on load testing project where the case could appear again, I propose to close this ticket and if I find some concrete things (and potentially a solution) I will re-open another one

Thanks