32kB ffmpeg reads slow things down (and eat up memory)

reinhrst commented 6 months ago

As far as I can see, ffmpeg reads the file in 32kB blocks. This means that for a 4GB video file, there are about 100k read calls. These read calls slow things down (at least on Chrome 119.0.6045.199 / MacOS), and in addition seem to eat up memory).

I did a quick test, and making a custom block reader that reads the file (as in: js File object, which is drag-and-dropped into the page) in 5MB chunks (and then serve a 32kB view to libav; so effectively cutting down the File.slice() operations to about 1k, but still having the libav.onread() 100k times), meant the time to Remux 1 file dropped from 58 seconds to 41 seconds (and this is including writing it out on the other side). This is with {noworker: true} and running in the page (not in a worker). I do have a future TODO to get this to run in a worker, and then compare the times (and see what sync IO may do).

However it seems to me that a quick win might be to convince ffmpeg() to read in larger blocks than 32k (all those calls to libav.onread() and libav.ff_block_reader_dev_send() are not free either; and these days I assume that not many devices with less than 512MB of memory will be running libav.js anyways.

I understand from #38 that it's not clear where the 32kB read size is defined, so we might need to do some digging to find this; I just wanted to log this here and collect thoughts about this (before doing the actual digging).

Yahweasel commented 6 months ago

Just out of curiosity, how does mkreadaheadfile do by comparison, if you're using Files?

reinhrst commented 6 months ago

So it's a bit unclear to me what mkreadaheadfile actually does. When I override File.prototype.slice to log what is requested, the requests are still for 32kB chunks.

Actually the test above (the 58 seconds) was done with mkreadaheadfile.

Yahweasel commented 6 months ago

It just does the reads ahead, that's all :) . i.e., when ffmpeg requests position 0, the mkreadahead reader sends that, and then starts the promise to read 32768. I never anticipated that it would make the I/O faster than processing, but it's supposed to slightly make up for the I/O latency. I suppose if readahead files were read in larger chunks, that may actually be sufficient; it depends whether the problem is the actual reads, or just ffmpeg doing thinks in a chunky, inefficient way. There is no particular reason why readahead files only read ahead one chunk.

(I don't pay a lot of attention to the "whole file" use case, since it's not how I use libav.js. I never use the CLI functions myself.)

reinhrst commented 6 months ago

Makes sense. Indeed if mkreadahead were to load larger chunks, it would make some difference (at the price of a bit more complexity in mkreadahead). Basically that would then be more or less as fast as my best test (the 41 seconds).

I don't know how much could be won by convincing ffmpeg itself to read larger chunks.

I'd be more than happy to do a deep-dive into all this, in a couple of weeks (first need to finish a working demo of my app :)), and see what I can find out.

Yahweasel commented 5 months ago

@reinhrst Is this still on your todo list? Shall I keep this open?

reinhrst commented 5 months ago

Thanks for prodding me. It's on my to-do list, but as always these lists tend to fill up with higher-prio items... I will keep it on my list; you can either close this and I will reopen once I know a bit more, or you can leave this open -- I understand the rationale for both:)

Yahweasel commented 4 months ago

(Closing this for now, reopen if you make progress or have further discussion on this topic.)

Yahweasel / libav.js

32kB ffmpeg reads slow things down (and eat up memory) #42