Closed brbbbq closed 11 months ago
@brbbbq That is an out-of-memory error. The latest commit just change RIFE's code so it is not related to your issue https://github.com/Fannovel16/ComfyUI-Frame-Interpolation/commit/ad8dd1f9c6bea3608ea8b8c445c79718ca99df58
I was able to get it to work again by limiting the number of input frames. The image size is 768x968, I have a 3080ti with 12GB VRAM. I accidentally updated ComfyUI just trying to figure out what version it is. Still don't know what the version number is, it's not listed anywhere that I can find, but apparently it's the most recent one now.
After updating, running the interpolation nodes no longer gives me an OOM error, it just freezes with no error reporting and I have to restart the client, much worse.
Luckily I did discover before updating, that I can run a batch of 105 frames, but any more than that and it hangs. I assumed the frame caching feature would mitigate VRAM usage, is it loading all the frames at once?
Looks like I'm using CUDA v11.8 and pytorch v1.13
Also want to add, that testing RIFE gave me similar OOM errors, but at larger frame numbers, so it's not isolated to the FILM node. I changed the title of the issue as a result.
Luckily I did discover before updating, that I can run a batch of 105 frames, but any more than that and it hangs. I assumed the frame caching feature would mitigate VRAM usage, is it loading all the frames at once?
It cache all the frames in VRAM during node execution but the model behind only take two frames at the same time. Clear cache should've prevented OOM in your case. Or maybe it cause the hanging. I'll add some short code to print when clear cache kicks in
it just freezes with no error reporting and I have to restart the client, much worse. Does RIFE also make Comfy frozen?
Looks like I'm using CUDA v11.8 and pytorch v1.13
I think you used the wrong Python as it shows "cpu". Run
ComfyUI_windows_portable\python_embeded\python -s
@brbbbq Update ComfyUI-Frame-Interpolation. My commit doesn't fix the issue but it should help pinpointing where it is by notifying when the clear caching kicks in.
Nice, got the error messages back :)
got prompt
Comfy-VFI: Clearing cache...
Comfy-VFI: Done cache clearing
Comfy-VFI: Clearing cache...
Comfy-VFI: Done cache clearing
Comfy-VFI: Clearing cache...
Comfy-VFI: Done cache clearing
Comfy-VFI: Clearing cache...
Comfy-VFI: Done cache clearing
Comfy-VFI: Clearing cache...
Comfy-VFI: Done cache clearing
Comfy-VFI: Clearing cache...
Comfy-VFI: Done cache clearing
Comfy-VFI: Clearing cache...
Comfy-VFI: Done cache clearing
Comfy-VFI: Clearing cache...
Comfy-VFI: Done cache clearing
Comfy-VFI: Clearing cache...
Comfy-VFI: Done cache clearing
Comfy-VFI: Clearing cache...
Comfy-VFI: Done cache clearing
Comfy-VFI: Clearing cache...
Comfy-VFI: Done cache clearing
Comfy-VFI: Clearing cache...
Comfy-VFI: Done cache clearing
!!! Exception during processing !!!
Traceback (most recent call last):
File "C:\Apps\ComfyUI_windows_portable\ComfyUI\execution.py", line 152, in recursive_execute
output_data, output_ui = get_output_data(obj, input_data_all)
File "C:\Apps\ComfyUI_windows_portable\ComfyUI\execution.py", line 82, in get_output_data
return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True)
File "C:\Apps\ComfyUI_windows_portable\ComfyUI\execution.py", line 75, in map_node_over_list
results.append(getattr(obj, func)(**slice_dict(input_data_all, i)))
File "C:\Apps\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-Frame-Interpolation\models\film\__init__.py", line 86, in vfi
relust = inference(model, frames[frame_itr], frames[frame_itr + 1], multiplier - 1)
File "C:\Apps\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-Frame-Interpolation\models\film\__init__.py", line 35, in inference
prediction = model(x0, x1, dt)
File "C:\Apps\ComfyUI_windows_portable\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript, serialized code (most recent call last):
File "code/__torch__/interpolator.py", line 15, in forward
x1: Tensor,
batch_dt: Tensor) -> Tensor:
_0 = (self).debug_forward(x0, x1, batch_dt, )
~~~~~~~~~~~~~~~~~~~ <--- HERE
return (_0["image"])[0]
def debug_forward(self: __torch__.interpolator.Interpolator,
File "code/__torch__/interpolator.py", line 64, in debug_forward
aligned_pyramid1 = __torch__.util.concatenate_pyramids(aligned_pyramid0, forward_flow, )
fuse = self.fuse
_18 = [(fuse).forward(aligned_pyramid1, )]
~~~~~~~~~~~~~ <--- HERE
_19 = {"image": _18, "forward_residual_flow_pyramid": forward_residual_flow_pyramid, "backward_residual_flow_pyramid": backward_residual_flow_pyramid, "forward_flow_pyramid": forward_flow_pyramid, "backward_flow_pyramid": backward_flow_pyramid}
return _19
File "code/__torch__/fusion.py", line 56, in forward
_04 = getattr(_3, "0")
net16 = (_04).forward(net15, )
net17 = torch.cat([pyramid[i2], net16], 1)
~~~~~~~~~ <--- HERE
_13 = getattr(_3, "1")
net18 = (_13).forward(net17, )
Traceback of TorchScript, original code (most recent call last):
File "C:\Users\Danylo\PycharmProjects\frame-interpolation-pytorch\interpolator.py", line 160, in forward
@torch.jit.export
def forward(self, x0, x1, batch_dt) -> torch.Tensor:
return self.debug_forward(x0, x1, batch_dt)['image'][0]
~~~~~~~~~~~~~~~~~~ <--- HERE
File "C:\Users\Danylo\PycharmProjects\frame-interpolation-pytorch\interpolator.py", line 151, in debug_forward
return {
'image': [self.fuse(aligned_pyramid)],
~~~~~~~~~ <--- HERE
'forward_residual_flow_pyramid': forward_residual_flow_pyramid,
'backward_residual_flow_pyramid': backward_residual_flow_pyramid,
File "C:\Users\Danylo\PycharmProjects\frame-interpolation-pytorch\fusion.py", line 116, in forward
net = F.interpolate(net, size=level_size, mode='nearest')
net = layers[0](net)
net = torch.cat([pyramid[i], net], dim=1)
~~~~~~~~~ <--- HERE
net = layers[1](net)
net = layers[2](net)
RuntimeError: Allocation on device 0 would exceed allowed memory. (out of memory)
Currently allocated : 5.84 GiB
Requested : 568.12 MiB
Device limit : 12.00 GiB
Free (according to CUDA): 0 bytes
PyTorch limit (set by user-supplied memory fraction)
: 17179869184.00 GiB
Prompt executed in 90.25 seconds
Looks like it fails on the 12th batch, though in my tests I would get failure at 110 frames. I'm guessing it's because it needs to repeat the last frame, so each batch is actually 9 frames? (9 frames x 12 batches = 108 frames).
My Comfyui PyTorch version is 2.0.1:
Is this something that looks fixable from your end? I'm currently trying to develop a more automated workflow to process the sequence in batches using image batch nodes.
Here's a ComfyUI workflow for doing batches with FILM: 2310232-FILM_batch_workflow-01.json
That's as far as I can get without knowing how to code and do conditional statements. You can repeat the 'Batch 1' group and rewire a few things to expand it for larger batches.
Simply displaying the cache clearing operations was actually really helpful. Being able to know when it fails makes it much easier to calculate the right batch sizes. Another thing I noticed, when increasing the multiplier on the FILM node, requires decreasing the batch size to compensate, as far as I can tell.
Is this something that looks fixable from your end?
Yes. My idea is that all frames are cached in CPU and only a certain part is moved to GPU at a single time. Currently, I only know how to implement it in 2-frames models tho (stmf and flavr are 4-frames models)
For me personally, I think FILM is the best quality that I've seen so far, so I'm happy to use that.
In regards to the memory error. When I separate the FILM VFI node into batches it works. I'm guessing something inbetween stopping the FILM VFI node and starting a new one, is causing the memory to be cleared out correctly.
@brbbbq Can you try the latest commit (https://github.com/Fannovel16/ComfyUI-Frame-Interpolation/commit/f19c7230cb883bf08a6dfb3e2099e75a51fa3318)?
I grabbed the latest version and ran a large job. Works great! Looks like the memory is getting cleared correctly. Here's an updated workflow with looping:
Thanks for fixing it!
I grabbed the latest version and ran a large job. Works great! Looks like the memory is getting cleared correctly. Here's an updated workflow with looping:
Thanks for fixing it!
any chance you could make this workflow public?
This node was working fine until yesterday, all of a sudden started getting this runtime error:
Tried reducing the clear cache to 5 frames, but didn't help. Here's my workflow:
This is the error from the terminal window: