ArdenButterfield / stammer

Recreate any audio track by rearranging the frames of another video
MIT License
442 stars 31 forks source link

Move video code to class, keep video frames in memory, 8-bit PNG #48

Closed Firepal closed 1 year ago

Firepal commented 1 year ago
Firepal commented 1 year ago

Gonna make an ImageFetcher class to simplify integrating my decaying image cache with tiling

Firepal commented 1 year ago

Moved all image fetching and video encoding to a VideoHandler class.

Encoding from memory works wonders, just need to re-implement encoding from disk

Firepal commented 1 year ago

Re-implemented encoding from PNGs on disk as a VideoHandler

Firepal commented 1 year ago

On another note, I am experimenting with having 8-bit PNG frames instead of the standard 16-bit.

There's two different ways ffmpeg allows this:

The former is a little on the slow side, but looks much better than the restricted palette that the latter uses. I might just keep the former

Firepal commented 1 year ago

Added an argument for video-in-memory and the faster pal8 mode. Do not merge right now, gotta test that stuff is working correctly

Firepal commented 1 year ago

should be good for testing :)

This new "video in memory" mode is fast enough that it could probably be the default... use it with -vim

Firepal commented 1 year ago

I can't be decided it seems, so now -vim is passed by default, and we still generate full 16-bit PNGs by default because it's fast and looks good. Separating frames on disk can still be done by passing -vod.

I will now stop making spontaneous changes lol :]

ArdenButterfield commented 1 year ago

This is very cool! Your code looks awesome and very clean and readable, which is super important! There was one little bug:

    if video_in_mem:
        handler = VideoHandlerMem(carrier_path,output_path,TEMP_DIR,matcher,carrier_framecount,video_frame_length,color_mode)
        handler.cache.decay = MEM_DECAY
    else:
        handler = VideoHandlerDisk(carrier_path,output_path,TEMP_DIR,matcher,carrier_framecount,video_frame_length,color_mode)

    if carrier_is_video:

should probably be

    if carrier_is_video:
        if video_in_mem:
            handler = VideoHandlerMem(carrier_path,output_path,TEMP_DIR,matcher,carrier_framecount,video_frame_length,color_mode)
            handler.cache.decay = MEM_DECAY
        else:
            handler = VideoHandlerDisk(carrier_path,output_path,TEMP_DIR,matcher,carrier_framecount,video_frame_length,color_mode)    

Otherwise we get errors when our carrier is audio-only.

I did have just a couple questions: one, in my mind it would make more sense to have 1 flag to toggle between vim and vod. What do you think?

Second, I did some basic time profiling of the code and didn't find that the changes gave much of a speed-up at all? In fact, 8full almost always made the code take longer than full, both when I tested it on my computer with pitiful ram, and on a much better-equipped server. Did you have similar experiences with 8full slowing down the code? I noticed that vim mode made the runtime a lot longer, but then I realized that I was missing the point, since the goal isn't run time optimization, but disk usage optimization.

Anyway I hope my review didn't come across as too critical! I really appreciate the work that you've put into this project, in this pr and elsewhere, and the whole cache idea is absolutely brilliant.

Firepal commented 1 year ago

Thanks for reviewing the code! Nice catch, will move the code accordingly.

in my mind it would make more sense to have 1 flag to toggle between vim and vod. What do you think?

Hm, do you mean a flag like the audio matcher mode with multiple choices? Because I like that idea. I'm already thinking of generalizing the "decaying cache" idea to work in disk mode too.


8full almost always made the code take longer than full, both when I tested it on my computer with pitiful ram, and on a much better-equipped server.

Yep, I am aware unfortunately. The per-frame palette generation done by ffmpeg when using 8full is a big speed bottleneck. I implemented 8fast, which uses the "faster pal8 mode" I mentioned in an earlier message, but it looks meh.

This is why full is the default; it is always faster and looks better. And the decaying cache makes it practical...

Since the 8-bit mode is kind of impractical, it could be scrapped. Thoughts?


I noticed that vim mode made the runtime a lot longer, but then I realized that I was missing the point, since the goal isn't run time optimization, but disk usage optimization.

Well, it is intended to be a disk usage optimization, but speed can still be optimized.

In vim mode, when we fetch one frame, we cache several surrounding frames too. This is defined in video_out.py with the frames_lookahead and frames_backtrack variables.

During a frame fetch, we call ffmpeg. It has to read the carrier file, seek backwards, find a keyframe, and finally decode the video until the frames we want. Then we cache them and encode the one frame we are currently focused on.

If the modulator is very varied, those extra cached frames may have been wasted processing time in the long run. (Video decoding and PNG encoding is not free...) On the flip side, if the modulator has a high custom frame length and/or is monotone, we risk unnecessarily calling ffmpeg too often, because we didn't fetch enough lookahead frames before.

It's a balancing act, but it can probably be mitigated with some good math behind the frames_lookahead variable.


Anyway I hope my review didn't come across as too critical! I really appreciate the work that you've put into this project, in this pr and elsewhere, and the whole cache idea is absolutely brilliant.

Criticism is welcome as long as it is constructive, and you've given me important feedback here! (Also, appreciate your appreciation of the code, i enjoy writing it :)

Firepal commented 1 year ago
Firepal commented 1 year ago
Firepal commented 1 year ago

I'm almost thinking of replacing --color_mode with --image_format so we could choose between ('png16','png8','jpeg') but I don't know if that would really be useful and I feel like this PR is already bloated as it is. IDK.

Firepal commented 1 year ago

Found an error case: I get an "unidentified image error" when the matcher mode is "combination" and the frame length is set to 0.4...

ArdenButterfield commented 1 year ago

Interesting... I know there's been a recurring error that people have been bringing up about the code trying to access frames that don't exist (frame number too big), and I wonder if that may have the same root cause. I wanna give the whole frame management system an overhaul because it's pretty messy at the moment, but that's an issue for the future... I think we can go ahead and merge this PR. You've done awesome work here.

ArdenButterfield commented 1 year ago

Ok just checked it out. It looks great, I really like how you've set up the modes/flags for the user. I'm gonna go ahead and merge.

Firepal commented 1 year ago

@ArdenButterfield I found the culprit to this bug, it is unrelated to what people have been reporting but is extremely important for correctness:

https://github.com/ArdenButterfield/stammer/blob/55c4a2cde7593c840e8b29d90d3344f3324f03c4/video_out.py#L138 This line should use self.frame_length, the video frame length! How'd I miss this?

Ugh. I guess it wouldn't be fun if the PR was perfect... :P