Closed reinhrst closed 7 months ago
I was really trying to rule out all situations, however it seems I missed something. ffmpeg()
reads the file in 32kB blocks:
const BLOCKSIZE = 32*1024;
let sum = 0;
for (let i=0; i < input.size; i += BLOCKSIZE) {
const data = await input.slice(i, Math.min(input.size, i + BLOCKSIZE)).arrayBuffer()
const ints = new Uint8Array(data)
sum = [...ints.slice(100, 110)].reduce((a, b) => a + b, sum)
}
const exit_code = 0
This does result in 2.9GB of memory consumed. These 130GB of data are about 4M reads, so that is about 512kB memory consumed per read. This happens in chrome 119.0.6045.123, and from this find I do now feel it's a bug in Chrome....
Just wondering, is there a simple way to convince ffmpeg to read larger blocks than 32k?
It is entirely possible that ffmpeg leaks memory. I had to manually rewrite it to clean up after itself, because its traditional method of cleaning up after itself is exiting :) . I probably just missed something.
That being said, if you're observing this without calling ffmpeg(), with just that loop (which obviously should not leak), then that seems potentially like a bug in Chrome. Or, potentially it's intended but non-obvious behavior: the GC, seeing increasing allocator pressure, may simply decide to keep more pools reserved even if nothing is actually leaking.
If you look near the top of libav.js's Makefile, there are options to compile it with memory debugging, which includes some (very limited) leak detection. It may be worth running that way to see if it can find an actual leak.
I'm not sure where the default size to read at a time actually comes from. There's -bufsize
, but I think that's only an output option.
Thanks for the pointers. Indeed -bufsize
does not have any effect. I'll close this for now since indeed it looks like Chrome is to blame
I'm using libavjs to convert large video files, by calling the function below multiple times:
The function is called on the main thread, but
libav()
fires off a new worker for each function call. I can see this worker in the I can see every function call lead to its own worker (in Chrome's taskmanager), which ends towards the end of the function (thanks tolibav.terminate()
).This method seems to leak 100MB of memory per 4GB file that I convert. After 56 files (130GB in total), the page holds about 2.9GB of memory (see screenshot from taskmanager):
It takes closing and reopening the tab to remove this memory (reloading the page does not do it).
The question is if I'm missing something like
ffmpeg_cleanup()
to clean up memory used by the ffmpeg() command. I'm also not familiar enough with Workers to know if terminating a worker should clean up all memory (including memory allocated in wasm).Side note: Let's be clear, this is not the end of the world, and it's easy to instruct users to close and reopen the tab after 100GB of file remuxing, but I rather do it the right way.
I also understand that I should probably run this whole function in its own worker (with
{noworker: true}
for theLibAV()
constructor), so I could use synchronous IO for writing the output file (I imagine the current method will be a disaster if the output IO is slower than the input IO), but this is just a POC.Things I've tried to narrow down the problem:
Rule out it has something to do with the write code
Removing the body of the
libav.onwrite
function does not change anything.See how
ffprobe
does (for a command that doesn't load the whole file)Replacing the call to
ffmpeg()
byffprobe()
means no measurable memory leak (if not the whole file is read)See how
ffprobe
does (for a command that does load the whole file)Replacing the call to
ffmpeg()
byffprobe()
does leak to 2.9GB being leakedSee if things change if not using workers
Putting
{noworker: true}
in theLibAV()
constructor has the effect of not using workers (I can see no workers in the taskmanager), but itstill consumes 2.9GB of memoryactually it consumes 2.8GB.Rule out that the leak is in reading from
File
To rule out a problem with the input side (reading from
File
) I replaced theffmpeg()
code by. In that case there is no memory leak: