KAIST-VICLab / FMA-Net

[CVPR 2024 Oral] Official repository of FMA-Net
https://kaist-viclab.github.io/fmanet-site/
MIT License
623 stars 44 forks source link

Deblurring and general video quality enhancement #11

Closed ngaer closed 4 months ago

ngaer commented 6 months ago

First, thanks a lot for sharing the model weights and the code. We did some tests and it works pretty well on certain types of video.

But in our use case, the videos we need to process have high resolution (720p or 1080p). So we don't really need to upscale them but only perform deblurring to make it look sharper and perform a general quality enhancement. We tried to run the model with default 4x upscaling on 720p video, and it ran out of GPU memory. If we downscale it to 360p before running through the model, it works, but the quality improvement compared to the original 720p video looks minor and such an approach doesn't look optimal overall.

@GeunhyukYouk Can you please recommend an approach or model configuration that works best for quality improvement of 720p/1080p videos?

GeunhyukYouk commented 6 months ago

Hi, thank you for your interest in FMA-Net.

It seems like the problem you're interested in is improving blurry videos in environments where GPU memory is insufficient.

First, since 720/1080p blurry videos have sufficiently high resolution, it seems feasible to enhance video quality by performing deblurring excluding SR, as you suggested. Through various ablation studies, we've confirmed that our FMA-Net performs quite well even when only performing deblurring. Therefore, modifying the scale factor to 1 and adjusting pixel shuffle layer, training the model and testing it on your videos should yield optimal quality enhancement results.

However, like existing FMA-Net, the retrained FMA-Net for deblurring may not work for 720/1080p videos in your environment due to out-of-memory issues. Therefore, a series of steps involving cutting the input sequence into patches, performing deblurring on patch units, and then recombining them is necessary. In this case, although global operations like flow-guided dynamic filtering and multi-attention may not perform the best, it could be considered the best approach in memory-limited environments. (The optimal approach, of course, would be retraining the FMA-Net for deblurring and testing it on the entire video.)

I hope this response was helpful, and if you have any further questions, please feel free to let me know.

ngaer commented 6 months ago

Thanks, for the details. But can you be more specific, on what exactly needs to be adjusted in the pixel shuffle layer?

Also, as I understand we can't use current weights to perform deblurring without upscaling, right? If we just set the scale factor to 1, it won't work, we need to retrain it first, right?

GeunhyukYouk commented 6 months ago

The existing code performs 4x upsampling in the pixel shuffle layer regardless of the scale value, so it needs to be modified. It can be simply adjusted by removing the pixel shuffle as follows.

class PixelShuffleBlock(torch.nn.Module):
    def __init__(self, channels, bias):
        super(PixelShuffleBlock, self).__init__()
        self.conv = nn.Conv2d(channels, channels, kernel_size=3, padding=1, stride=1, bias=bias)
        self.relu = nn.LeakyReLU(negative_slope=0.2, inplace=True)

    def forward(self, x):
        x = self.relu(self.conv(x))

        return x

Also, as you mentioned, since the provided pretrained model is trained for 4x VSRDB, it needs to be retrained for scale=1.

GeunhyukYouk commented 4 months ago

I will close this issue as there has been no further discussion. Please re-open the issue if there are additional comments.