AmusementClub / vs-mlrt

Efficient CPU/GPU ML Runtimes for VapourSynth (with built-in support for waifu2x, DPIR, RealESRGANv2/v3, Real-CUGAN, RIFE, SCUNet and more!)
GNU General Public License v3.0
292 stars 20 forks source link

Suggestion: TTA implementation #16

Closed xurich-xulaco closed 1 year ago

xurich-xulaco commented 1 year ago

I know TTA is not that of an amazing function, yet, some small image processing or short videos would be benefited from have more accuracy on their Vapoursynth scripts.

WolframRhodium commented 1 year ago

TTA can be implemented by manually rotate and process the input without the help of vs-mlrt, isn't it enough?

xurich-xulaco commented 1 year ago

Apart from rotation (and the subsequent de-rotation after mlrt had already done its process), how could it be "averaged"? My first guess is to the example of std.Expr(clips=[clipa, clipb, clipc], expr="x y + z + 3 /") made on the expr documentation ... would it actually be that simple? :P

WolframRhodium commented 1 year ago

exactly

xurich-xulaco commented 1 year ago

Sample code, as reference for the future:

def Waifu2xVulkanTTA(clip: vnode=core.std.BlankClip(format=vs.RGBS), noise: int=-1, scale: int=2):
    from vsmlrt import Waifu2x, Waifu2xModel, Backend

    arr = [clip]
    arr.append(arr[0].std.Transpose())
    arr.append(arr[1].std.Transpose())
    arr.append(arr[2].std.Transpose())

    arr.append(arr[0].std.FlipHorizontal())
    arr.append(arr[1].std.FlipVertical())
    arr.append(arr[2].std.FlipHorizontal())
    arr.append(arr[3].std.FlipVertical())

     for x in range(len(arr)):
        arr[x] = Waifu2x(arr[x], noise=noise, scale=scale, model=Waifu2xModel.cunet, backend=Backend.NCNN_VK())

    arr[1] = arr[1].std.Transpose().std.Transpose().std.Transpose()
    arr[2] = arr[2].std.Transpose().std.Transpose()
    arr[3] = arr[3].std.Transpose()

    arr[4] = arr[4].std.FlipHorizontal()
    arr[5] = arr[5].std.FlipVertical().std.Transpose().std.Transpose().std.Transpose()
    arr[6] = arr[6].std.FlipHorizontal().std.Transpose().std.Transpose()
    arr[7] = arr[7].std.FlipVertical().std.Transpose()

    return core.std.Expr(clips=[arr[0], arr[1], arr[2], arr[3], arr[4], arr[5], arr[6], arr[7]], expr="x y + z + a + b + c + d + e + 8 /")
AkarinVS commented 1 year ago

No need to create separate Waifu2x for each clip: either splice or interleave those clips and then create a single Waifu2x filter for all the clips. This will save quite a bit of GPU memory (8x reduction) without performance degradation (if GPU utilization not yet full, increase num_streams).

btw, some of clips in arr are identical (e.g. arr[0] == arr[2]).

xurich-xulaco commented 1 year ago

Due to transpose changing the resolution of the original clip, and not being able to easily stack it up on a single clip due to vapoursynth standard splice policies, I came up with these 3 functions that might work as reference for anyone (It sure worked for me, as in no crashing). Be ware that this script was minded for a single image processing, a whole video processing needs to have std.Interleave() instead of splicing it onto a new clip.

def Waifu2xHalfTTA(clip, noise: int=-1, scale: int=2):

    arr = [clip]

    arr.append(arr[0].std.FlipHorizontal())
    arr.append(arr[1].std.FlipVertical())
    arr.append(arr[2].std.FlipHorizontal())

    #new = core.std.Interleave([arr[0], arr[1], arr[2], arr[3]]) instead of using a for to stack all up
    new = arr[0]

    for x in range(1, 4):
        new += arr[x]

    new = Waifu2x(new, noise=noise, scale=scale, model=Waifu2xModel.cunet, backend=Backend.OV_CPU())

    #Instead of Triming, if using a whole video, you should be using std.SelectEvery() respectively
    arr[0] = new.std.Trim(0,0)
    arr[1] = new.std.Trim(1,1).std.FlipHorizontal()
    arr[2] = new.std.Trim(2,2).std.FlipHorizontal().std.FlipVertical()
    arr[3] = new.std.Trim(3,3).std.FlipHorizontal().std.FlipVertical().std.FlipHorizontal()

    return core.std.Expr(clips=[arr[0], arr[1], arr[2], arr[3]], expr="x y + z + a + 4 /")

def Waifu2xTransposeHalfTTA(clip, noise: int=-1, scale: int=2):

    arr = [clip.std.Transpose()]

    arr.append(arr[0].std.FlipHorizontal())
    arr.append(arr[1].std.FlipVertical())
    arr.append(arr[2].std.FlipHorizontal())

    new = arr[0]

    for x in range(1, 4):
        new += arr[x]

    new = Waifu2x(new, noise=noise, scale=scale, model=Waifu2xModel.cunet, backend=Backend.OV_CPU())

    arr[0] = new.std.Trim(0,0)
    arr[1] = new.std.Trim(1,1).std.FlipHorizontal()
    arr[2] = new.std.Trim(2,2).std.FlipHorizontal().std.FlipVertical()
    arr[3] = new.std.Trim(3,3).std.FlipHorizontal().std.FlipVertical().std.FlipHorizontal()

    return core.std.Expr(clips=[arr[0], arr[1], arr[2], arr[3]], expr="x y + z + a + 4 /").std.Transpose()

def Waifu2xTTA(clip, noise: int=-1, scale: int=2):
    WoTranspose = Waifu2xHalfTTA(clip, noise, scale)
    Trasnposed = Waifu2xTransposeHalfTTA(clip, noise, scale)

    return core.std.Expr(clips=[WoTranspose, Trasnposed], expr="x y + 2 /")

Now that I think about it, std.Interleave(mismatch=True) might circumvent the splice problem, although, I'm not very sure if vsmlrt might react properly