Open denyshubh opened 9 months ago
I may not be following correctly. Is this somehow different from the already implemented Batch Manager node? That already exists to handle the task of splitting a large video into batches and re-combining the batches afterwards.
@AustinMroz Hey, I've not had a lot of time to look into Batch Manager, but my impression of it is that it does not quite fix the RAM problem in the intended/intuitive way. I apologize for not being able to provide feedback on this for a long time.
For stuff like AnimateDiff or other techniques that need to have the images as part of the same tensor, since Batch Manager ends up producing lists of tensors (of a particular batch size), Comfy interprets that by running each Batched entry separately. For example, if you were to load 128 frames and load them with Batch Manager with 16 batch length, then it would end making 8 separate runs that get stitched together at the end rather than letting the KSampler handle things with knowledge of the whole intended tensor (that contains all the images/latents as one tensor rather than a list).
The big point of comparison is between Load Images vs Load Video for the same amount of frames. Load Images does not have the same RAM issues, which means that amount of images themselves isn't the issue, but rather whatever goes on in the loop that loads and concatenates the images with opencv/numpy.
I'm not too sure how Batched Manager works just by looking at the code, but it would be for the best if we could figure out the issue in the code that somehow makes Load Video use up so much RAM despite Load Images not having the same issue. Maybe the batch logic in Batch Manager can be repurposed to create a single tensor from the list immediately, since it appears Batch Manager use does fix the memory issue (and the fact all images being combined at the end in Video Combine does not cause a memory issue). Assuming there's no problem with that, we could hardcore a sensible internal batch value so that we can avoid the memory issue altogether and make Batch Manager usage obsolete and get the desired single tensor behavior.
My biggest gripe with custom Comfy nodes when I first got into comfy dev work has been the confusing interchange of lists and batched tensors, and if possible I'd like VideoHelperSuite to not have to play on that level. And in cases where it must, then it must be as clear as possible to the user that the tensors are (functionally) a list instead of a single tensor batch.
Always glad to hear your insight on things and willing to cede to your vision. Projects I've started don't get much buzz.
As a short description of the Batch Manager: It is entirely removed from the ComfyUI latent batches dimension. It just automates re-queuing the entire workflow with a capped number of frames to process a single video input into a single video output.
The whole workflow batching was born of a reasonable request. As others were going so far as to test and open pull requests into the development branch for it, I figured releasing it was in-line with your recommendation to not hold back updates overlong.
I recognize that its "frames_per_batch" produces strict windows that break temporal cohesion, but figured it a best effort implementation at automating what was otherwise already being suggested to users. Communicating functionality and limitations to users could probably be improved, though. Of course, I'd certainly prefer a full solution(mem-mapping latents?) but that seems like it'd break established interfaces too much for my taste.
I, until now, had no reason to believe that load video had worse performance than load images. I had assumed that the greater number of issues opened was merely a consequence of the ease with which users could process large amounts of data, but I did some testing just in case this was hubris getting the better of me. I loaded the first 500 frames of a 1920 x 1080 video with each of the load nodes piped into a quick dummy image output node and measured the total memory used by ComfyUI.
load video single
12,906,324
12,897,020
12,896,940
500 frames on disk: 848,684
load images
24,944,952
24,944,536
24,944,852
3x repeat load video
12,910,108
3x repeat load images
24,896,008
After accounting for imports, this matches my expectations of Load Video using half the memory of Load Images. If there's some underlying issue causing Load Video to use significantly more memory, it's not something I've been able to reproduce.
EDIT: It should be quite doable to
However, the caching of output will mean that this would need to be aproximately divided by the number of nodes in the workflow that use process image data and that seems like it'd be tricky to get right.
Gotcha, thanks for looking into it! My impression about the mem issue was due to previous reports about manually exporting frames for a video and then loading them manually, but that was before your memory optimizations, so that likely just isn't reproducible anymore.
The workflow requeueing approach is interesting - does the whole workflow get requeued, or does it do some clever things to only redo certain parts of the workflow? There are definitely a lot of cases where this approach could be useful where there is no AD involved/frames can be handled separately. Thanks for explaining!
I think the best way forward is to create a video that would normally cause RAM OOM when loading a video (for a particular amount of RAM), and see if exporting the images separately and then using Load Images causes any difference in behavior - my hunch is that it's some intermediate part of the process that causes the bottleneck. Assuming we can confirm if this is still an issue, then we can experiment with some alternate approaches to loading video frames, like your method a few months ago of using ffmpeg to export frames, and then load them in. If both Load Videos and Load Images have the same OOM limit, then the only real suggestion we can make besides Batch Manager would be to use the "force_size" input to resize the images as it loads them, so that they will use less space.
Just to make sure my understanding is correct, with the Batch Manager, does Video Combine produce a single video? And if so, does it do this by concatenating the images into a single tensor, or some other method of combining the videos together? If it does combine them into one tensor before making the video, that would point toward an intermediate bottleneck in Load Video, and the solution would be to try to replicate how those images are combined but in Load Video instead.
It applies the same state caching as normal but with the Batch Manager node (and everything dependent upon it) marked stale. Load Video and Video Combine both have special handing where the input/output handler is held open between executions. For Load Video, this means that there's no extra latency of seeking through the frames which are skipped.
Please let me know if you find anything. I have a recently updated load_video_alpha branch in which this same functionality is introduced to the old ffmpeg loading code, but I haven't done any memory comparisons yet and I'm not currently intending for the branch to be merged.
To confirm, with the Batch Manager, Video Combine produces a single output file, but frames are no longer held in memory after a workflow completes. Instead, after each workflow sub-execution the output folder contains a partial video file with the encoded frames up to that point. It should be currently possible (with no code changes, but a beefy computer) to have Load Video (Path) take a url to a video stream as an input file, begin workflow execution, and have a seperate program stream from the partial output file to create a never-ending output stream.
EDIT: Quick testing shows the ffmpeg load code taking a negligibly smaller amount of memory, but it took significantly longer to execute (30s ffmpeg to 6 opencv) Performance has been fixed
single Load Video (ffmpeg)
12782368
12819692
12779780
x3 Load Video (ffmpeg)
12821056
@AustinMroz @Kosinkadink - Hi guys, this works well! Thanks for this! 🙏
I was able to upscale Full-HD video to 4k with no issues with the Meta Batch Manager node (16 frames per batch, the default).
Just one question - how can we execute one more node - such as upscale one more time node (save another copy of already upscaled video) once the first video combine node is finished successfully?
How can we get the output path once the video combine node finishes it's final operation (all batches) - is there any support for this?
Thanks!
Describe the bug
A memory allocation error occurs when attempting to load large videos using the
ComfyUI-VideoHelperSuite
'sVHS_LoadVideoPath
function on Windows OS. The error is encountered in theload_video_cv
function when trying to create a PyTorch tensor from a numpy array of video frames.To Reproduce
Steps to reproduce the behavior:
ComfyUI-VideoHelperSuite
to load a large video file.Expected behavior
The video should load successfully without causing a memory allocation error, regardless of the video size.
Error Message
Environment:
Additional context
The issue seems related to attempting to load the entire video into memory, which is not feasible for large videos.
Suggested Solution: Batch Processing of Images
To mitigate this issue, I suggest implementing a batch processing mechanism for video frames. This approach involves loading and processing a set number of frames at a time, thereby significantly reducing the memory footprint. Here's a conceptual approach to integrate batch processing:
cv_frame_generator
to yield frames in batches rather than attempting to load all frames into memory at once.cv_frame_generator
to handle these batches accordingly, processing each batch of frames before moving on to the next.This method would allow
ComfyUI-VideoHelperSuite
to handle videos of any size by limiting the number of frames in memory at any given time.