[Bug]: Controlnet Framecount not matching

Rauzer commented 6 months ago

Is there an existing issue for this?

[X] I have searched the existing issues and checked the recent builds/commits of both this extension and the webui

Have you read FAQ on README?

[X] I have updated WebUI and this extension to the latest version

What happened?

When generating controlnet frames through animatediff using a video as input. The amount of controlnet frames (that i save seperately as well, for further iterations) do not match with the actual framecount of the video, nor with what animediff says it should be. The full video is read out, but a different framecount and fps is in the end result.

A 17 second video of 30 fps should be around 510frames (as windows does not show less than whole seconds.) In this case animatediff says its 528 frames at 30 fps. But generates 373 frames instead.

Steps to reproduce the problem

Go to Animatediff and load in a video
Enable controlnet (any should do. I've used depth and openpose predominantly)
Animediff should show the correct fps of the video and framecount.
Double check with windows properties to see if the calculated data seems correct. (17x30 ~510 with some variance as only whole seconds are displayed)
Press generate. Animatediff will now extract the frames before giving them as input to controlnet.
Animatediff is not generating the correct amount of frames.

What should have happened?

All frames should've been passed to the controlnets/frame generation. Instead of only a subset at a different fps (but consistent).

Commit where the problem happens

webui: https://github.com/AUTOMATIC1111/stable-diffusion-webui/commit/bef51aed032c0aaa5cfd80445bc4cf0d85b408b5 extension:

Animatediff (https://github.com/continue-revolution/sd-webui-animatediff/commit/a81565d906c0f16c9a9b95cd80af7b8bafda7cb3) - This problem was present in earlier versions I just didn't catch on. As looking at some earlier projects i also have a mismatching framecount. I believe this is since the 2.0 release.
Controlnet (https://github.com/Mikubill/sd-webui-controlnet/commit/2091b6fb21d9c76becb2a8860c8d2975ad3e428a)

What browsers do you use to access the UI ?

No response

Command Line Arguments

--xformers

Console logs

The process runs as normal and succesfully. No discernable errors or warning. Just a mismatching framecount.

Additional information

The used image extractor is FFMPEG

Rauzer commented 6 months ago

If i run FFMPEG manually on the video file through cmd:

ffmpeg -i input.mp4 Frames\out%3d.png

Where i do not specify any other parameters i do get 528 frames

FFMPEG Manual

continue-revolution commented 6 months ago

It is probable that my script get something different because of different parameter setup. Please just use the extracted frame path instead and you will get 528 frames.

Rauzer commented 6 months ago

having looked at the code itself, it is because of the higher fps of the source video. And that within the animatediff call/extension elimate frame duplication is run on ffmpeg. (I am no expert in useage of ffmpeg)

Link to code line: https://github.com/continue-revolution/sd-webui-animatediff/blob/a81565d906c0f16c9a9b95cd80af7b8bafda7cb3/scripts/animatediff_utils.py#L82C1-L83C104

Link to documentation: https://www.ffmpeg.org/ffmpeg-all.html#mpdecimate

Obviously this is not bad, especially with a video of high fps. You just end up with a 'surprising' amount of frames as opposed to what animatediff says in advance. And if you do what i'm doing, which is letting it run once with 1 step and a controlnet to specifically get the controlnet maps. To then use later as it makes it easier to make multiple variants/change prompts, as well as on my system it slightly reduces VRAM useage.

I then directly use the generated controlnet images later on as batch input for my controlnets, instead of requiring them to be regenerated. But it does mean you end up with i.e. a 12 second video (at same fps), as opposed to 17 seconds.

So it as you say, your parameters differ from the 'default' (no frame removal). And it makes sense, except that it does influence.. the output length (can cause a mismatch). Or speeds up the overall animation.

continue-revolution commented 6 months ago

I see. I am neither an expert of ffmpeg. My ffmpeg code is copied from https://github.com/cyber-meow/anime_screenshot_pipeline/blob/main/anime2sd/extract_frames.py

I think it’s fine to preserve the current parameter setup. If users have special need, they can ask GPT for frame extraction command and use a path to frame images instead.

Rauzer commented 6 months ago

Yeah, i was just surprised i ended up with a 13 second video, when my input was 17. But the full motion was there (just.. sped up). And it was because of this entire process. And since with the skipped frames, the used FPS (as found by the input video) makes it speed up (duplicated frames removed, yet FPS remains the same. So sped up video)

It does work fine if you manually extract frames beforehand and use a frame folder as input, then you get everything. Which also means the generated output lines up with the duration of the original video (if you set the FPS to the same amount).

It's not what I would expect though, but thats a different story. I could manually fix the previous video by going back to frames, and recompiling with 'lower' fps to the approriate length. (I needed to because i manually add in the video's audio back after the entire process using FFMPEG)

continue-revolution / sd-webui-animatediff