HolyWu / vs-rife

RIFE function for VapourSynth
MIT License
94 stars 7 forks source link

4K video stuttering with this as filter in MPV #49

Closed F0903 closed 3 months ago

F0903 commented 3 months ago

I simply cannot get 4K video to work properly, no matter how many parameters I change. It works fine for the first second, then goes wildly down the drain. It's like if it was a 4fps video.

This is despite tweaking the different parameters, and trying the lowest values possible. MPV also reports that the video fps is ~60, and my GPU isn't always maxed out, so the issue is probably not that it can't keep up? Weirdly though, MPV reports the estimated screen refresh rate dips to around 13, which I have not have happen before.

Current VapourSynth script:

# Loosely based on https://raw.githubusercontent.com/hooke007/MPV_lazy/50ebf2c6570aa2db45bb158c3da2cbc8e3fb013e/portable_config/vs/rife_2x.vpy

from fractions import Fraction
import vapoursynth as vs
from vsrife import rife

output_format = vs.YUV444P10 # YUV444P10 (or YUV420P10) for full 10-bit color YUV420P8 for 8-bit color.
display_peak_brightness=800

# Disable RIFE when above threshold.
# Skip interpolation for >4K or >=60 Hz content due to performance
disable_fps_threshold = 59
disable_width_threshold = 3840
disable_height_threshold = 2160

# Switch to half scale when above threshold.
reduced_scale_width_threshold = disable_width_threshold * 0.85
reduced_scale_height_threshold = disable_height_threshold * 0.85
reduced_scale = 0.5

# Uses nvidia RT cores. Will require an RTX GPU (not tested on anything other than RTX 4080)
# It also takes a million years to build an RT engine for each resolution and config, but it is much faster than regular.
gpu_tensorrt=True
tensorrt_debug=True # Enable for TensorRT debug logging.

gpu_streams=4 # This setting very quickly explodes your VRAM usage.
gpu_format=vs.RGBH # RGBH is faster, RGBS is more accurate
ensemble=True # Produces better results but also more expensive.

core = vs.core
clip = video_in

def aprox(num, target, margin = 0.05):
    return target - margin <= num <= target + margin

if not (clip.width > disable_width_threshold or clip.height > disable_height_threshold or container_fps > disable_fps_threshold):
    sup  = core.mv.Super(clip, pel=1)
    vec = core.mv.Analyse(sup, blksize=8, isb=True)
    clip = core.mv.SCDetection(clip=clip, vectors=vec, thscd1=240, thscd2=130)

    target_fps = 60
    target_frac = Fraction(target_fps / container_fps).limit_denominator(100)

    print(f"Target frac={target_frac.numerator}/{target_frac.denominator}={target_frac.numerator/target_frac.denominator}")

    uhd = clip.width >= reduced_scale_width_threshold and clip.height >= reduced_scale_height_threshold

    if uhd:
        print("Clip is UHD. Adjusting settings...")

    scale = reduced_scale if uhd else 1
    ensemble = False if uhd else ensemble
    print(f"Expensive settings:\ngpu_streams={gpu_streams},scale={scale},ensemble={ensemble},gpu_tensorrt={gpu_tensorrt}")

    clip = core.resize.Lanczos(clip=clip, format=gpu_format, matrix_in_s='709')
    clip = rife(clip=clip, model="4.16.lite", factor_num=target_frac.numerator, factor_den=target_frac.denominator, device_index=0, num_streams=gpu_streams, scale=scale, ensemble=ensemble, sc=True, trt=gpu_tensorrt, trt_debug=tensorrt_debug)
    clip = core.resize.Lanczos(clip=clip, format=output_format, matrix_s="2020ncl", nominal_luminance=display_peak_brightness)

    new_fps = Fraction(container_fps * target_frac).limit_denominator(100)
    print(f"New fps_frac={new_fps.numerator}/{new_fps.denominator}={new_fps.numerator/new_fps.denominator}")
    clip = core.std.AssumeFPS(clip=clip, fpsnum=new_fps.numerator, fpsden=new_fps.denominator)

else:
    print("Video is too expensive to interpolate. Ignoring...")

clip.set_output()
HolyWu commented 3 months ago

Have you tried 1080p video and got the same result? Also can you run a benchmark with vspipe -p ooxx.vpy -- to see what fps you get? You may need to replace video_in with a source filter like core.bs.VideoSource(r"ooxx.mkv") and manually define the container_fps variable.

F0903 commented 3 months ago

Yes, with 1080p and under it works perfectly. I have tried the benchmark, and I'm getting ~40fps with a 4K30 video upscaling to 60fps.

HolyWu commented 3 months ago

Since the benchmarked fps (40) is lower than the target interpolating fps (60), you definitely won't be able to playback smoothly in realtime.

F0903 commented 3 months ago

Yeah, that does make sense, sadly. But does an output from rife of 40fps really cause the playback to drop to about 4fps in MPV?

HolyWu commented 3 months ago

No idea. But decoding (if using hardware decoder) and rendering also utilize GPU, and options like --vo or --profile etc in mpv can also have different impact on performance.

F0903 commented 3 months ago

That is true. I'll investigate it some more. Thank you for your help.

F0903 commented 3 months ago

@HolyWu There is actually one last thing. I cannot see a change when using the scale parameter of the function. Both 0.25 and 1 give me the same performance, is this normal?

HolyWu commented 3 months ago

scale < 1 performance issue should be fixed in v5.1.0.

F0903 commented 3 months ago

I have tried to benchmark again with the new 5.1.0 version, but it sadly didn't change anything for me. scale 1 performs the same for me as 0.5 and 0.25

HolyWu commented 3 months ago

Then I have no idea. Tried scale=<0.25/0.5/1.0> on 1080p and 4K video with my poor RTX 3050 and actually got different speed.

1080p
scale=1.0:  60.83 fps
scale=0.5:  76.84 fps
scale=0.25: 80.91 fps

4K
scale=1.0:  15.45 fps
scale=0.5:  19.35 fps
scale=0.25: 21.26 fps
F0903 commented 3 months ago

Hmm, alright then :/