Fannovel16 / ComfyUI-Frame-Interpolation

A custom node set for Video Frame Interpolation in ComfyUI.
MIT License
458 stars 45 forks source link

VFI Runtime Error TorchScript #23

Closed brbbbq closed 11 months ago

brbbbq commented 11 months ago

This node was working fine until yesterday, all of a sudden started getting this runtime error: 231023-073303-chrome-1

Tried reducing the clear cache to 5 frames, but didn't help. Here's my workflow: 231023-091527-chrome-1

This is the error from the terminal window:

!!! Exception during processing !!!
Traceback (most recent call last):
  File "C:\Apps\ComfyUI_windows_portable\ComfyUI\execution.py", line 152, in recursive_execute
    output_data, output_ui = get_output_data(obj, input_data_all)
  File "C:\Apps\ComfyUI_windows_portable\ComfyUI\execution.py", line 82, in get_output_data
    return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True)
  File "C:\Apps\ComfyUI_windows_portable\ComfyUI\execution.py", line 75, in map_node_over_list
    results.append(getattr(obj, func)(**slice_dict(input_data_all, i)))
  File "C:\Apps\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-Frame-Interpolation\models\film\__init__.py", line 86, in vfi
    relust = inference(model, frames[frame_itr], frames[frame_itr + 1], multiplier - 1)
  File "C:\Apps\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-Frame-Interpolation\models\film\__init__.py", line 35, in inference
    prediction = model(x0, x1, dt)
  File "C:\Apps\ComfyUI_windows_portable\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript, serialized code (most recent call last):
  File "code/__torch__/interpolator.py", line 15, in forward
    x1: Tensor,
    batch_dt: Tensor) -> Tensor:
    _0 = (self).debug_forward(x0, x1, batch_dt, )
          ~~~~~~~~~~~~~~~~~~~ <--- HERE
    return (_0["image"])[0]
  def debug_forward(self: __torch__.interpolator.Interpolator,
  File "code/__torch__/interpolator.py", line 64, in debug_forward
    aligned_pyramid1 = __torch__.util.concatenate_pyramids(aligned_pyramid0, forward_flow, )
    fuse = self.fuse
    _18 = [(fuse).forward(aligned_pyramid1, )]
            ~~~~~~~~~~~~~ <--- HERE
    _19 = {"image": _18, "forward_residual_flow_pyramid": forward_residual_flow_pyramid, "backward_residual_flow_pyramid": backward_residual_flow_pyramid, "forward_flow_pyramid": forward_flow_pyramid, "backward_flow_pyramid": backward_flow_pyramid}
    return _19
  File "code/__torch__/fusion.py", line 56, in forward
    _04 = getattr(_3, "0")
    net16 = (_04).forward(net15, )
    net17 = torch.cat([pyramid[i2], net16], 1)
            ~~~~~~~~~ <--- HERE
    _13 = getattr(_3, "1")
    net18 = (_13).forward(net17, )

Traceback of TorchScript, original code (most recent call last):
  File "C:\Users\Danylo\PycharmProjects\frame-interpolation-pytorch\interpolator.py", line 160, in forward
    @torch.jit.export
    def forward(self, x0, x1, batch_dt) -> torch.Tensor:
        return self.debug_forward(x0, x1, batch_dt)['image'][0]
               ~~~~~~~~~~~~~~~~~~ <--- HERE
  File "C:\Users\Danylo\PycharmProjects\frame-interpolation-pytorch\interpolator.py", line 151, in debug_forward

        return {
            'image': [self.fuse(aligned_pyramid)],
                      ~~~~~~~~~ <--- HERE
            'forward_residual_flow_pyramid': forward_residual_flow_pyramid,
            'backward_residual_flow_pyramid': backward_residual_flow_pyramid,
  File "C:\Users\Danylo\PycharmProjects\frame-interpolation-pytorch\fusion.py", line 116, in forward
            net = F.interpolate(net, size=level_size, mode='nearest')
            net = layers[0](net)
            net = torch.cat([pyramid[i], net], dim=1)
                  ~~~~~~~~~ <--- HERE
            net = layers[1](net)
            net = layers[2](net)
RuntimeError: Allocation on device 0 would exceed allowed memory. (out of memory)
Currently allocated     : 5.77 GiB
Requested               : 568.12 MiB
Device limit            : 12.00 GiB
Free (according to CUDA): 0 bytes
PyTorch limit (set by user-supplied memory fraction)
                        : 17179869184.00 GiB
Fannovel16 commented 11 months ago

@brbbbq That is an out-of-memory error. The latest commit just change RIFE's code so it is not related to your issue https://github.com/Fannovel16/ComfyUI-Frame-Interpolation/commit/ad8dd1f9c6bea3608ea8b8c445c79718ca99df58

  1. What is the resolution of your video?
  2. Did you update your ComfyUI? If so, what is the version of it (including cuda and pytorch version)?
brbbbq commented 11 months ago

I was able to get it to work again by limiting the number of input frames. The image size is 768x968, I have a 3080ti with 12GB VRAM. I accidentally updated ComfyUI just trying to figure out what version it is. Still don't know what the version number is, it's not listed anywhere that I can find, but apparently it's the most recent one now.

After updating, running the interpolation nodes no longer gives me an OOM error, it just freezes with no error reporting and I have to restart the client, much worse.

Luckily I did discover before updating, that I can run a batch of 105 frames, but any more than that and it hangs. I assumed the frame caching feature would mitigate VRAM usage, is it loading all the frames at once?

Looks like I'm using CUDA v11.8 and pytorch v1.13 231023-110848-cmd 231023-104237-python

brbbbq commented 11 months ago

Also want to add, that testing RIFE gave me similar OOM errors, but at larger frame numbers, so it's not isolated to the FILM node. I changed the title of the issue as a result.

Fannovel16 commented 11 months ago

Luckily I did discover before updating, that I can run a batch of 105 frames, but any more than that and it hangs. I assumed the frame caching feature would mitigate VRAM usage, is it loading all the frames at once?

It cache all the frames in VRAM during node execution but the model behind only take two frames at the same time. Clear cache should've prevented OOM in your case. Or maybe it cause the hanging. I'll add some short code to print when clear cache kicks in

it just freezes with no error reporting and I have to restart the client, much worse. Does RIFE also make Comfy frozen?

Looks like I'm using CUDA v11.8 and pytorch v1.13

I think you used the wrong Python as it shows "cpu". Run

ComfyUI_windows_portable\python_embeded\python -s
Fannovel16 commented 11 months ago

@brbbbq Update ComfyUI-Frame-Interpolation. My commit doesn't fix the issue but it should help pinpointing where it is by notifying when the clear caching kicks in.

brbbbq commented 11 months ago

Nice, got the error messages back :)

got prompt
Comfy-VFI: Clearing cache...
Comfy-VFI: Done cache clearing
Comfy-VFI: Clearing cache...
Comfy-VFI: Done cache clearing
Comfy-VFI: Clearing cache...
Comfy-VFI: Done cache clearing
Comfy-VFI: Clearing cache...
Comfy-VFI: Done cache clearing
Comfy-VFI: Clearing cache...
Comfy-VFI: Done cache clearing
Comfy-VFI: Clearing cache...
Comfy-VFI: Done cache clearing
Comfy-VFI: Clearing cache...
Comfy-VFI: Done cache clearing
Comfy-VFI: Clearing cache...
Comfy-VFI: Done cache clearing
Comfy-VFI: Clearing cache...
Comfy-VFI: Done cache clearing
Comfy-VFI: Clearing cache...
Comfy-VFI: Done cache clearing
Comfy-VFI: Clearing cache...
Comfy-VFI: Done cache clearing
Comfy-VFI: Clearing cache...
Comfy-VFI: Done cache clearing
!!! Exception during processing !!!
Traceback (most recent call last):
  File "C:\Apps\ComfyUI_windows_portable\ComfyUI\execution.py", line 152, in recursive_execute
    output_data, output_ui = get_output_data(obj, input_data_all)
  File "C:\Apps\ComfyUI_windows_portable\ComfyUI\execution.py", line 82, in get_output_data
    return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True)
  File "C:\Apps\ComfyUI_windows_portable\ComfyUI\execution.py", line 75, in map_node_over_list
    results.append(getattr(obj, func)(**slice_dict(input_data_all, i)))
  File "C:\Apps\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-Frame-Interpolation\models\film\__init__.py", line 86, in vfi
    relust = inference(model, frames[frame_itr], frames[frame_itr + 1], multiplier - 1)
  File "C:\Apps\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-Frame-Interpolation\models\film\__init__.py", line 35, in inference
    prediction = model(x0, x1, dt)
  File "C:\Apps\ComfyUI_windows_portable\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript, serialized code (most recent call last):
  File "code/__torch__/interpolator.py", line 15, in forward
    x1: Tensor,
    batch_dt: Tensor) -> Tensor:
    _0 = (self).debug_forward(x0, x1, batch_dt, )
          ~~~~~~~~~~~~~~~~~~~ <--- HERE
    return (_0["image"])[0]
  def debug_forward(self: __torch__.interpolator.Interpolator,
  File "code/__torch__/interpolator.py", line 64, in debug_forward
    aligned_pyramid1 = __torch__.util.concatenate_pyramids(aligned_pyramid0, forward_flow, )
    fuse = self.fuse
    _18 = [(fuse).forward(aligned_pyramid1, )]
            ~~~~~~~~~~~~~ <--- HERE
    _19 = {"image": _18, "forward_residual_flow_pyramid": forward_residual_flow_pyramid, "backward_residual_flow_pyramid": backward_residual_flow_pyramid, "forward_flow_pyramid": forward_flow_pyramid, "backward_flow_pyramid": backward_flow_pyramid}
    return _19
  File "code/__torch__/fusion.py", line 56, in forward
    _04 = getattr(_3, "0")
    net16 = (_04).forward(net15, )
    net17 = torch.cat([pyramid[i2], net16], 1)
            ~~~~~~~~~ <--- HERE
    _13 = getattr(_3, "1")
    net18 = (_13).forward(net17, )

Traceback of TorchScript, original code (most recent call last):
  File "C:\Users\Danylo\PycharmProjects\frame-interpolation-pytorch\interpolator.py", line 160, in forward
    @torch.jit.export
    def forward(self, x0, x1, batch_dt) -> torch.Tensor:
        return self.debug_forward(x0, x1, batch_dt)['image'][0]
               ~~~~~~~~~~~~~~~~~~ <--- HERE
  File "C:\Users\Danylo\PycharmProjects\frame-interpolation-pytorch\interpolator.py", line 151, in debug_forward

        return {
            'image': [self.fuse(aligned_pyramid)],
                      ~~~~~~~~~ <--- HERE
            'forward_residual_flow_pyramid': forward_residual_flow_pyramid,
            'backward_residual_flow_pyramid': backward_residual_flow_pyramid,
  File "C:\Users\Danylo\PycharmProjects\frame-interpolation-pytorch\fusion.py", line 116, in forward
            net = F.interpolate(net, size=level_size, mode='nearest')
            net = layers[0](net)
            net = torch.cat([pyramid[i], net], dim=1)
                  ~~~~~~~~~ <--- HERE
            net = layers[1](net)
            net = layers[2](net)
RuntimeError: Allocation on device 0 would exceed allowed memory. (out of memory)
Currently allocated     : 5.84 GiB
Requested               : 568.12 MiB
Device limit            : 12.00 GiB
Free (according to CUDA): 0 bytes
PyTorch limit (set by user-supplied memory fraction)
                        : 17179869184.00 GiB

Prompt executed in 90.25 seconds

Looks like it fails on the 12th batch, though in my tests I would get failure at 110 frames. I'm guessing it's because it needs to repeat the last frame, so each batch is actually 9 frames? (9 frames x 12 batches = 108 frames).

My Comfyui PyTorch version is 2.0.1: 231023-152452-cmd

Is this something that looks fixable from your end? I'm currently trying to develop a more automated workflow to process the sequence in batches using image batch nodes.

brbbbq commented 11 months ago

Here's a ComfyUI workflow for doing batches with FILM: 2310232-FILM_batch_workflow-01.json 231023-190408-chrome

That's as far as I can get without knowing how to code and do conditional statements. You can repeat the 'Batch 1' group and rewire a few things to expand it for larger batches.

Simply displaying the cache clearing operations was actually really helpful. Being able to know when it fails makes it much easier to calculate the right batch sizes. Another thing I noticed, when increasing the multiplier on the FILM node, requires decreasing the batch size to compensate, as far as I can tell. 231023-185819-cmd-1

Fannovel16 commented 11 months ago

Is this something that looks fixable from your end?

Yes. My idea is that all frames are cached in CPU and only a certain part is moved to GPU at a single time. Currently, I only know how to implement it in 2-frames models tho (stmf and flavr are 4-frames models)

brbbbq commented 11 months ago

For me personally, I think FILM is the best quality that I've seen so far, so I'm happy to use that.

In regards to the memory error. When I separate the FILM VFI node into batches it works. I'm guessing something inbetween stopping the FILM VFI node and starting a new one, is causing the memory to be cleared out correctly.

Fannovel16 commented 11 months ago

@brbbbq Can you try the latest commit (https://github.com/Fannovel16/ComfyUI-Frame-Interpolation/commit/f19c7230cb883bf08a6dfb3e2099e75a51fa3318)?

brbbbq commented 11 months ago

I grabbed the latest version and ran a large job. Works great! Looks like the memory is getting cleared correctly. Here's an updated workflow with looping: 231102-FILM_interpolation_with_looping

Thanks for fixing it!

drmbt commented 2 months ago

I grabbed the latest version and ran a large job. Works great! Looks like the memory is getting cleared correctly. Here's an updated workflow with looping: 231102-FILM_interpolation_with_looping

Thanks for fixing it!

any chance you could make this workflow public?