TianxingWu / FreeInit

[ECCV 2024] FreeInit: Bridging Initialization Gap in Video Diffusion Models
https://tianxingwu.github.io/pages/FreeInit/
MIT License
454 stars 24 forks source link

Error on generating video larger than 512,512. #14

Closed drnighthan closed 6 months ago

drnighthan commented 7 months ago

I installed the FreeInit-hf code locally, it runs perfect on 512,512. Then I set the size at 512,768, here is the error: Traceback (most recent call last): File "F:\FreeInit-hf-main\venv\lib\site-packages\gradio\routes.py", line 488, in run_predict output = await app.get_blocks().process_api( File "F:\FreeInit-hf-main\venv\lib\site-packages\gradio\blocks.py", line 1431, in process_api result = await self.call_function( File "F:\FreeInit-hf-main\venv\lib\site-packages\gradio\blocks.py", line 1103, in call_function prediction = await anyio.to_thread.run_sync( File "F:\FreeInit-hf-main\venv\lib\site-packages\anyio\to_thread.py", line 33, in run_sync return await get_asynclib().run_sync_in_worker_thread( File "F:\FreeInit-hf-main\venv\lib\site-packages\anyio_backends_asyncio.py", line 877, in run_sync_in_worker_thread return await future File "F:\FreeInit-hf-main\venv\lib\site-packages\anyio_backends_asyncio.py", line 807, in run result = context.run(func, args) File "F:\FreeInit-hf-main\venv\lib\site-packages\gradio\utils.py", line 707, in wrapper response = f(args, *kwargs) File "F:\FreeInit-hf-main\app.py", line 287, in animate sample_output = pipeline( File "F:\FreeInit-hf-main\venv\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context return func(args, *kwargs) File "F:\FreeInit-hf-main\animatediff\pipelines\pipeline_animation.py", line 590, in call latents = freq_mix_3d(z_T.to(dtype=torch.float32), z_rand, LPF=self.freq_filter) File "F:\FreeInit-hf-main\animatediff\utils\freeinit_utils.py", line 24, in freq_mix_3d x_freq_low = x_freq LPF RuntimeError: The size of tensor a (96) must match the size of tensor b (64) at non-singleton dimension 3

I think the problem is at the LPF part, I print out the shape of x_freq and LPF. x_freq: torch.Size([1, 4, 16, 96, 64]) , LPF: torch.Size([1, 4, 16, 64, 64]), since the generation size is 512,768 the latent size should be 96,64. Please check the code of LPF. Thanks a lot

drnighthan commented 7 months ago

The bug comes from app.py. the update_filter fail to update the data comes from gr. By delecting the code it works well. And also it need to change the order of height_slider and width_slider of update_filter. 这个bug应该是app.py中的update_filter出问题导致的.我将中间几行代码注释掉,然后调整了height_slider, width_slider的顺序,可以成功输出512,768甚至512,1024了,可以看一下.

self.set_width = width_slider

    # self.set_height = height_slider
    # self.selected_filter_type = filter_type_dropdown
    # self.set_d_s = d_s
    # self.set_d_t = d_t
    if self.set_width != width_slider or self.set_height != height_slider or self.selected_filter_type != filter_type_dropdown or self.set_d_s != d_s or self.set_d_t != d_t:
        self.update_filter(height_slider, width_slider , filter_type_dropdown, d_s, d_t)
TianxingWu commented 7 months ago

@drnighthan The resolution bug has been fixed. Thanks for your feedback!