Be faster and better - Githubissues

I used screenshots of animated movies, and I don't expect this model to have a good effect on animation, but I want to see how it works against darkness and railings.

These are what I used: mpv-shot0001 mpv-shot0002

But the GPU's performance surprised me： I just python .\demo_Nx.py --n 3, and the gpu do like this?

Then I tried this:

import cv2
import sys
import torch

import numpy as np
from imageio import mimsave

sys.path.append(".")
import config as cfg
from benchmark.utils.padder import InputPadder

from model import feature_extractor, flow_estimation

I0 = cv2.imread("example/mpv-shot0001.jpg")
I2 = cv2.imread("example/mpv-shot0002.jpg")

I0_ = (torch.tensor(I0.transpose(2, 0, 1)).cuda() / 255.0).unsqueeze(0)
I2_ = (torch.tensor(I2.transpose(2, 0, 1)).cuda() / 255.0).unsqueeze(0)

padder = InputPadder(I0_.shape, divisor=32)
I0_, I2_ = padder.pad(I0_, I2_)

backbonetype, multiscaletype = (feature_extractor, flow_estimation)
# backbonecfg, multiscalecfg = cfg.init_model_config(F=16, depth=[2, 2, 2, 2, 2])
backbonecfg, multiscalecfg = cfg.init_model_config(F=32, depth=[2, 2, 2, 4, 4])
net = flow_estimation(feature_extractor(**backbonecfg), **multiscalecfg)

def convert(param):
    return {
        k.replace("module.", ""): v
        for k, v in param.items()
        if "module." in k and "attn_mask" not in k and "HW" not in k
    }

net.load_state_dict(convert(torch.load(f"ckpt/ours_t.pkl")))
net.eval()
net.to(torch.device("cuda"))

imgs = torch.cat((I0_, I2_), 1)
pred = net(imgs)

mid = (
    padder.unpad(pred)[0]
    .detach()
    .cpu()
    .numpy()
    .transpose(1, 2, 0)
    * 255.0
).astype(np.uint8)
mimsave("example/out_2x.jpg", [mid[:, :, ::-1]])

The result is more funny:

torch.cuda.OutOfMemoryError: CUDA out of memory. 
Tried to allocate 44.00 MiB. GPU 0 has a total capacty of 4.00 GiB of which 0 bytes is free. 
Of the allocated memory 9.82 GiB is allocated by PyTorch, and 186.06 MiB is reserved by PyTorch but unallocated. 
If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation.  
See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

How can two ordinary 1080p movie screenshots run out of GPU memory?

MCG-NJU / EMA-VFI

Be faster and better #13