AmusementClub / vs-mlrt

Efficient CPU/GPU ML Runtimes for VapourSynth (with built-in support for waifu2x, DPIR, RealESRGANv2/v3, Real-CUGAN, RIFE, SCUNet and more!)
GNU General Public License v3.0
273 stars 18 forks source link

input pixel bitdepth not supported with fmtc.resample #32

Closed ViRb3 closed 1 year ago

ViRb3 commented 1 year ago

I have the following:

backend = Backend.TRT(
    fp16=True,
    output_format=1,
    force_fp16=True,
    device_id=0,
    tf32=False,
    use_cudnn=False,
    num_streams=2,
    workspace=None,
    use_cuda_graph=True,
)
clip = core.ffms2.Source(
    source="/path/to/file.mp4"
    cache=False,
)
clip = core.resize.Lanczos(
    clip,
    format=vs.RGBS,
)
clip = RealESRGANv2(
    clip,
    model=RealESRGANv2Model.animevideov3,
    backend=backend,
    scale=2,
)

I get a crash like:

Traceback (most recent call last):
  File "src\cython\vapoursynth.pyx", line 2866, in vapoursynth._vpy_evaluate
  File "src\cython\vapoursynth.pyx", line 2867, in vapoursynth._vpy_evaluate
  File "inference.py", line 179, in <module>
    clip = RealESRGANv2(
  File "vsmlrt\vsmlrt.py", line 517, in RealESRGAN
    clip = core.fmtc.resample(clip, scale=rescale, kernel="lanczos", taps=4, fh=1/rescale, fv=1/rescale)
  File "src\cython\vapoursynth.pyx", line 2612, in vapoursynth.Function.__call__
vapoursynth.Error: resample: input pixel bitdepth not supported.

I can work around by omitting scale and then manually rescaling like:

clip = core.resize.Lanczos(clip, clip.width/2, clip.height/2)

Why is this happening though?

Thanks

NSQY commented 1 year ago

Why is this happening though?

Thanks

This may be caused by https://github.com/EleonoreMizo/fmtconv/issues/38. It has been fixed but but a new release has not yet been pushed.

ViRb3 commented 1 year ago

Makes sense, thank you! @WolframRhodium in your fix commit above, can I ask why you are doing:

if rescale > 1:
    clip = core.resize.Lanczos(clip, int(clip_org.width * scale), int(clip_org.height * scale), filter_param_a=4)
else:
    if clip_org.format.bits_per_sample != 32:
        clip = core.resize.Point(clip, format=vs.RGBS)

    clip = core.fmtc.resample(clip, scale=rescale, kernel="lanczos", taps=4, fh=1/rescale, fv=1/rescale)

    if clip_org.format.bits_per_sample == 16:
        clip = core.resize.Point(clip, format=vs.RGBH)

Wouldn't the following single line cover the entire rescale part:

clip = core.resize.Lanczos(clip, int(clip_org.width * scale), int(clip_org.height * scale), filter_param_a=4)

Sub-1 scale would work. All color spaces would work too. Is there any benefit to using fmtconv instead of the built-in functions?

WolframRhodium commented 1 year ago

Because the cv2.resize used by the author is not an ordinary lanczos-4 resize (downsampling without anti-aliasing) and has to be implemented in fmtc.