Ossianaa / node-libcurl

node-libcurl is a request library that mimics chrome ja3 fingerprints
https://www.npmjs.com/package/@ossiana/node-libcurl
MIT License
42 stars 8 forks source link

Memory leaks? #28

Open iamoskvin opened 1 week ago

iamoskvin commented 1 week ago

I have a memory leak in the nodejs script, and I suppose that it is due to the node-libcurl. Did you see something similar? Any advices? Thanks.

Ossianaa commented 1 week ago

Do you have simple code that can be replicated ?

iamoskvin commented 6 days ago

Do you have simple code that can be replicated ? Probably, the code below (run with node --expose-gc). The heapUsed and heapTotal are growing endlessly. The real app eat 500Mb (with 100 curl instances) before I killed it. So, the leak is substantial.

import { LibCurl, fetch } from "@ossiana/node-libcurl";
const curl = new LibCurl();
const repro = async () => {
  for (let i = 0; i < 5000; i++) {
    let resp = await fetch(
      "https://raw.githubusercontent.com/nodejs/node/main/doc/changelogs/CHANGELOG_V20.md",
      {
        headers: { "Cache-Control": "no-cache" },
        instance: curl,
      }
    );
    await resp.text();
    resp = null;
    const mem = process.memoryUsage();
    console.log(mem.heapUsed, mem.heapTotal);
    global.gc();
  }
};

repro()
  .then(() => console.log("Finished"))
  .catch((err) => console.log("Error", err));
iamoskvin commented 6 days ago

If I am running this code with inspect flag: node --expose-gc --inspect-brk the memory consumption increases linearly.

But if I run the similar code with native fetch, then I see that memory is cleaning regularly and consumption is quite constant

Ossianaa commented 6 days ago

If I am running this code with inspect flag: node --expose-gc --inspect-brk the memory consumption increases linearly.

But if I run the similar code with native fetch, then I see that memory is cleaning regularly and consumption is quite constant

What's your system, and node-libcurl version?

iamoskvin commented 6 days ago

If I am running this code with inspect flag: node --expose-gc --inspect-brk the memory consumption increases linearly. But if I run the similar code with native fetch, then I see that memory is cleaning regularly and consumption is quite constant

What's your system, and node-libcurl version?

Linux Mint 22. But I am also running the script on the Ubuntu 22.04.3 server. node-libcurl 1.6.5 (because I would like to reuse connections that you disabled in the next version). I can test other versions if needed.

Ossianaa commented 6 days ago

If I am running this code with inspect flag: node --expose-gc --inspect-brk the memory consumption increases linearly. But if I run the similar code with native fetch, then I see that memory is cleaning regularly and consumption is quite constant

What's your system, and node-libcurl version?

Linux Mint 22. But I am also running the script on the Ubuntu 22.04.3 server. node-libcurl 1.6.5 (because I would like to reuse connections that you disabled in the next version). I can test other versions if needed.


setInterval(() => {
    const afterMem = process.memoryUsage();
    console.log(
        afterMem,
        afterMem.heapUsed - initMem.heapUsed,
        afterMem.heapTotal - initMem.heapTotal,
    );
}, 1e3);

you can append this code for testing, i think nodejs' gc is lazy, and the difference in memory after a few seconds I guess is a cache made by nodejs for performance optimization

iamoskvin commented 6 days ago
initMem

is initMem measured just at the script start? I was running the script for several minutes (10000 requests), not seconds. 

Ossianaa commented 6 days ago
import { LibCurl, fetch } from "@ossiana/node-libcurl";

class ConcurrencyRunner {
    #concurrencyLimit = 0;
    #taskExecutor = null;
    #shouldStop = null;
    constructor({
        concurrencyLimit = 1,
        taskExecutor,
        shouldStop = () => false,
    } = {}) {
        if (typeof concurrencyLimit !== "number" || concurrencyLimit < 1) {
            throw new Error("concurrencyLimit must be a positive number");
        }
        if (typeof taskExecutor !== "function") {
            throw new Error("taskExecutor must be a function");
        }
        if (typeof shouldStop !== "function") {
            throw new Error("shouldStop must be a function");
        }

        this.#concurrencyLimit = concurrencyLimit;
        this.#taskExecutor = taskExecutor;
        this.#shouldStop = shouldStop;
    }

    async run() {
        let counter = 0;
        const tasks = new Set();

        while (1) {
            if (await this.#shouldStop(counter)) {
                break;
            }

            if (tasks.size < this.#concurrencyLimit) {
                const taskPromise = Promise.resolve()
                    .then(() => this.#taskExecutor(counter))
                    .catch((error) => {
                        console.error("ConcurrencyRunner.run", error);
                    })
                    .finally(() => {
                        tasks.delete(taskPromise);
                    });

                tasks.add(taskPromise);
                counter++;
            } else {
                await Promise.race(tasks);
            }
        }

        await Promise.all(tasks);
    }
}
let initMem;
const repro = async () => {
    initMem = process.memoryUsage();
    await new ConcurrencyRunner({
        concurrencyLimit: 20,
        async taskExecutor() {
            let resp = await fetch(
                "https://raw.githubusercontent.com/nodejs/node/main/doc/changelogs/CHANGELOG_V20.md",
                {
                    headers: { "Cache-Control": "no-cache" },
                },
            );
            await resp.text();
        },
        shouldStop(t) {
            console.log(t);
            return t === 5000;
        },
    }).run();
    const afterMem = process.memoryUsage();
    console.log(
        initMem,
        afterMem,
        afterMem.heapUsed - initMem.heapUsed,
        afterMem.heapTotal - initMem.heapTotal,
    );
};

repro()
    .then(() => console.log("Finished"))
    .catch((err) => console.log("Error", err));

setInterval(() => {
    const afterMem = process.memoryUsage();
    console.log(
        afterMem,
        afterMem.heapUsed - initMem.heapUsed,
        afterMem.heapTotal - initMem.heapTotal,
    );
}, 1e3);
iamoskvin commented 5 days ago

Thanks, I ran a lot of tests, including your code. I think that the most simple test is with the code below: If I read the resp in any way: resp.text or resp.arraybuffer I see a leak: image image

Without reading a response we don't have a problem: image

And with reading a resp with native fetch (nodejs 23) I also don't see a memory leak: image

What are your thoughts?

import { LibCurl, fetch } from "@ossiana/node-libcurl";
const curl = new LibCurl();

const repro = async () => {
  for (let i = 0; i < 5000; i++) {
    const resp = await fetch(
      "https://raw.githubusercontent.com/nodejs/node/main/doc/changelogs/CHANGELOG_V20.md",
      {
        headers: { "Cache-Control": "no-cache" },
        instance: curl,
      }
    );
    await resp.text();
  }
};

repro()
  .then(() => console.log("Finished"))
  .catch((err) => console.log("Error", err));

setInterval(() => {
  const afterMem = process.memoryUsage();
  console.log(afterMem.heapUsed, afterMem.heapTotal);
}, 1e3);
iamoskvin commented 5 days ago

Any ideas? It looks like a problem with reading a response body by the library...

Ossianaa commented 4 days ago

Any ideas? It looks like a problem with reading a response body by the library...

i'll find a time to check it