How to improve video generation speed?

C00reNUT commented 1 day ago

🔘 Request type

General feedback

🔘 Description

Hello, thank you for making this public, I have tried some older versions about half a year ago and the quality in the recent version is much better!

May I ask you - I am trying to improve rendering speed and I have see the comment in Comfy implementation https://github.com/akatz-ai/ComfyUI-Depthflow-Nodes/issues/7 that is saying that is possible to "on RTX 4090 I can complete a 1024x1024 image as a 5 second animation at 30 FPS (so total 150 frames) in 3.5 seconds" - I am trying CLI with

xvfb-run depthflow h264 --preset ultrafast input -i image.jpg main -w 640 -h 360 -o ./output_ultrafast.mp4

and I am getting around 5min on 3090 card... The speed difference is quite big, considering that 3090 shall be about twice slower as 4090... but I guess most of it will be ffmpeg rendering settings... I have tried different preset flags but it didn't make any difference...

So I thought I would just ask whether there is some quick fix that could speed up the rendering? I haven't go through the code, API and ShaderFlow library yet, I will go through it tomorrow...

btw. great code and very nice documentation, very few single developer projects have this quality!

Tremeschin commented 1 day ago

Thanks for the words!

Please avoid xvfb-run at all costs, it only supports CPU rendering, you're not using the GPU at all 🙂

I've got a lengthy write-up for a list of cloud providers that are known to work or not at https://brokensrc.dev/get/docker/, as well as running locally instructions and example docker files on the monorepo. Creating the opengl context with egl is a must for servers/true headless rendering linux speeds

My 3060 can pump about 1080p 580 fps max on raw rendering speeds, but my 5900x cpu can only do like 2 frames a second at 100% usage, GPUs are that much faster 😆

There's a CPU memory 'bottleneck' when reading the frames and sending to FFmpeg though, my system can do about 1080p120 fps encoding with current default h264 settings (slow preset) in ffmpeg, so yea, it's at least two orders of magnitude faster when the GPU is working properly

I can help you further if you say if it's your own hardware, or some unlisted provider on the docs!

C00reNUT commented 1 day ago

Thank you for such a quick response!

Yes I am using xvfb-run because I am running it on my own remote server, I can modify anything on but and when I don't use xvfb-run I get "ValueError: Failed to initialize glfw" but I only tried CLI so far, maybe this won't happen when running python code? + there must be some GPU utilization because I can see nvidia-smi taking around 700mb during inference...

"Creating the opengl context with egl is a must for servers/true headless rendering linux speeds" thank you for putting me into the the right direction, I will try it...

Tremeschin commented 1 day ago

try WINDOW_BACKEND=headless depthflow h264 (...), forgot to mention that 😅

C00reNUT commented 1 day ago

yes, took 3 seconds instead of 5 minutes xD this is the proof that ChatGPT and Cloude still doesn't provide the best solution :D

Thank you very much for such a quick response, you saved me lots of time, I was willing to go through the code tomorrow :)

It is wonderful nice tool, it works better than all implementations I have seen on github

C00reNUT commented 1 day ago

Thank you once again!

Tremeschin commented 1 day ago

^{Heh, claude and gpt.. flashback of spicy conversations wanting to do simple stuff on code}

Awesome!, I'll add this to the examples documentation somewhere as I missed it, in fact, I didn't even start working on shaderflow docs where it should really be 😓, want to rewrite some files before changing focus

Happy to help! feel free to ask me anything in the future as well :), also got some community groups if you're interested

C00reNUT commented 1 day ago

Heh, claude and gpt.. flashback of spicy conversations wanting to do simple stuff on code

They are pretty good for quick prototyping, I actually recommend it for writing documentation, especially claude is surprisingly good at docstrings etc. And seeing the benchmarks for the new qwen code model there will be nice local option. Pretty good for simple statistical next word prediction based on previous words...

Btw. In case you would be bored or want to try to add some additional functionality this could be nice addition to the DepthFlow - https://github.com/christian-byrne/infinite-parallax it could use ether depth maps or something like https://github.com/facebookresearch/sam2 - At least I think so :) They are using diffusers models for inpainting so the inference is quite heavy...

Tremeschin commented 1 day ago

a, I use them all the time, but love and anger are the same point at the opposite side of a circle LOL

I did some work with qwen also, it's indeed one of the best models on the small side for running locally, I have code for ollama on the main repo for some future ideas 👀

one comfyui user did something similar to this infinite parallax repo (which was actually one of the early motivations for the project that was dropped), but I've made good progress yesterday (see the discussion asking to fill in the gaps in the repo), so much potential hehehe

as for sam2, yea, quite heavy models for making image to videos, but they're doing something depthflow could never do: to move objects positions and adding new data with natural motion 😅

BrokenSource / DepthFlow

How to improve video generation speed? #59

🔘 Request type

🔘 Description