alexheretic / vimg

CLI for video images. Generates animated video contact sheets fast.
MIT License
20 stars 2 forks source link

My initial feedback #2

Open veikk0 opened 1 year ago

veikk0 commented 1 year ago

First, thanks for yet another wonderful tool. This is one of those things I didn't know I needed before seeing it. So far I've been using vcs by Toni Corvera to generate static video contact sheets, but animated ones really take things to the next level.

I've got some initial feedback and thought I'd write an all-in-one rather than submit multiple issue reports. Most of this is about animated sheets, I haven't tried static ones yet.

And I just realized that vimg has separate "extract" and "join" commands. So I can test sharpening and different encoder settings myself and report my results.

alexheretic commented 1 year ago

Thanks for the feedback!

The output being in slow-motion by default was confusing for me

Yeah I can understand this. My use case is slowmo (I thought it looked better) so I just left that default. I'd be happy with having realtime as the default it does seem a logical default :sweat_smile:

Encoder tuning

Right now encoding is done with libaom-av1, svt-av1 has much less flexible support for various dimensions that the contact sheets could end up as. We should be able to expose the encoder options to the cli though to allow tweaks like this.

We could also allow just exporting all joined frames as .bmp files and exiting before the final encoding phase.

Applying sharpening to the frames post-downscale would be something to look into

Interesting, there is a --vfilter option (which is done when extracting the frames from the input video). So this could be used for this right? I suppose I should make the docs clearer that this vfilter is for extraction not the avif encoding.

Having a video metadata header in the contact sheet would be nice

This is actually a sort of "non-goal" I'd rather avoid coding for headers or footers, perhaps this kind of thing can be added to an image or video afterwards without too much hassle.

And I just realized that vimg has separate "extract" and "join" commands.

Yep I'm trying to make the cli fairly modular so users could pick a choose the bits they'd want to use and vimg itself wouldn't need to handle absolutely everything. Ultimately I'm not 100% sure how useful they are in the end.

alexheretic commented 1 year ago

the user is expected to be aware of the frame rate of their input video and utilize the --avif-fps, --capture-frames, and --capture-time switches to tell the program how many frames to capture and how quickly to play them back.

This isn't quite true, the input frame rate doesn't actually matter, at least not so much. If you pick --capture-time 1s --capture-frames 30 --avif-fps 30 you'll get a 30fps avif no matter the input framerate.

With --capture-time X --capture-frames Y you are saying generate Y frames over input video time X. Frames might get duplicated here if you're asking for more frames than the input fps can provide (so in that sense it doesn't make sense to ask for more frames than the fps).

--avif-fps Z then simply plays each vcs-frame at the given speed. So if you've produced 30 frames over 1s then --avif-fps 30 is realtime. If you produced 60 frames over 3s then --avif-fps 20 would be realtime.

veikk0 commented 1 year ago

If you pick --capture-time 1s --capture-frames 30 --avif-fps 30 you'll get a 30fps avif no matter the input framerate.

Thanks, I understand how it works better now.

svt-av1 has much less flexible support for various dimensions that the contact sheets could end up as

I actually remember commenting about this on the SVT-AV1 bug tracker a couple years ago. I think since 0.9.0, SVT-AV1 has handled non-standard dimensions better, but when re-encoding one of the vcs sheets, I got the error Source Width must be even for YUV_420 colorspace. This reminded me of a similar x264 requirement, and one of the mentioned workarounds of making width and height divisible by two was also effective for SVT-AV1:

-vf "scale=trunc(iw/2)2:trunc(ih/2)2"

I also did a benchmark with the current encoding settings (libaom cpu-used 5, 10-bit) vs SVT-AV1 preset 5 fast-decode 8-bit, and on my AVX2 capable laptop CPU, the 8-bit SVT-AV1 file took 42% fewer CPU cycles to decode. The command I used was /usr/bin/time -p ffplay -autoexit -noframedrop -loop 100. The test files are below:

(VMAF is 95.12 for the libaom file, and 95.93 for the SVT-AV1 file, but they look to be pretty much the same quality to my eye. And IME SVT-AV1 tends to look a bit worse visually compared to what VMAF indicates).

Encoding took pretty much an identical amount of CPU time, but libaom took 32% longer in terms of wall clock time, probably because it's not very well parallelized. This was on a dual-core 4-thread system as well, so the difference would likely be even more pronounced on a higher core count machine.

I also tried libaom 8-bit, but for some reason it's using almost 2x the CPU time of 10-bit during the encoding process, which definitely shouldn't be happening. I just won't bother with it for now, since the difference won't be as dramatic as SVT-AV1's fast decode tune anyway. I'm using git builds of FFmpeg and the encoders, so I guess bugs are going to happen.