MCG-NJU / EMA-VFI

[CVPR 2023] Extracting Motion and Appearance via Inter-Frame Attention for Efficient Video Frame Interpolatio
Apache License 2.0
390 stars 42 forks source link

Be faster and better #13

Open tomcup opened 1 year ago

tomcup commented 1 year ago

Try not using itmm, then it'll be simple to turn it into C/C++. I find it very slow for the initiation, I don't know why. In the example, it seems that cpu is used, is there some special reason not using gpu? GPUs are used more in video processing usually.

I've tried the demo, --n=8 for 30s and --n=32 for 2min(No initialization time) on my 1650gpu. The average of 3.75s per frame is much higher than RIFE(VapourSynth-RIFE-ncnn-Vulkan: rife-v4.6 ensemble=True , I tried using it on a 2h 1080p film from 24 to 60, the whole process is about 14h, average of it is 0.12s per frame), why is that?

tomcup commented 1 year ago

I used screenshots of animated movies, and I don't expect this model to have a good effect on animation, but I want to see how it works against darkness and railings.

These are what I used: mpv-shot0001 mpv-shot0002

But the GPU's performance surprised me: image I just python .\demo_Nx.py --n 3, and the gpu do like this?

Then I tried this:

import cv2
import sys
import torch

import numpy as np
from imageio import mimsave

sys.path.append(".")
import config as cfg
from benchmark.utils.padder import InputPadder

from model import feature_extractor, flow_estimation

I0 = cv2.imread("example/mpv-shot0001.jpg")
I2 = cv2.imread("example/mpv-shot0002.jpg")

I0_ = (torch.tensor(I0.transpose(2, 0, 1)).cuda() / 255.0).unsqueeze(0)
I2_ = (torch.tensor(I2.transpose(2, 0, 1)).cuda() / 255.0).unsqueeze(0)

padder = InputPadder(I0_.shape, divisor=32)
I0_, I2_ = padder.pad(I0_, I2_)

backbonetype, multiscaletype = (feature_extractor, flow_estimation)
# backbonecfg, multiscalecfg = cfg.init_model_config(F=16, depth=[2, 2, 2, 2, 2])
backbonecfg, multiscalecfg = cfg.init_model_config(F=32, depth=[2, 2, 2, 4, 4])
net = flow_estimation(feature_extractor(**backbonecfg), **multiscalecfg)

def convert(param):
    return {
        k.replace("module.", ""): v
        for k, v in param.items()
        if "module." in k and "attn_mask" not in k and "HW" not in k
    }

net.load_state_dict(convert(torch.load(f"ckpt/ours_t.pkl")))
net.eval()
net.to(torch.device("cuda"))

imgs = torch.cat((I0_, I2_), 1)
pred = net(imgs)

mid = (
    padder.unpad(pred)[0]
    .detach()
    .cpu()
    .numpy()
    .transpose(1, 2, 0)
    * 255.0
).astype(np.uint8)
mimsave("example/out_2x.jpg", [mid[:, :, ::-1]])

The result is more funny:

torch.cuda.OutOfMemoryError: CUDA out of memory. 
Tried to allocate 44.00 MiB. GPU 0 has a total capacty of 4.00 GiB of which 0 bytes is free. 
Of the allocated memory 9.82 GiB is allocated by PyTorch, and 186.06 MiB is reserved by PyTorch but unallocated. 
If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation.  
See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

How can two ordinary 1080p movie screenshots run out of GPU memory?