NevermindNilas / TheAnimeScripter

Welcome to TheAnimeScripter – the ultimate tool for Video Upscaling, Interpolating and many more. Available as a CLI, GUI and Adobe Extension.
GNU Affero General Public License v3.0
80 stars 1 forks source link

Unable to use TensorRT RIFE #43

Closed przemoc closed 1 month ago

przemoc commented 1 month ago

I haven't played with TensorRT so far at all, so maybe I'm doing something wrong. TAS v1.9.0. I tried both RIFE4.17 and RIFE4.18.

PS C:\Users\przemoc> C:\Tools\TAS\main.exe --input "D:\python\stable-diffusion-workspace\animation\0001-epg_z-beginning-of-the-universe\concat.mp4" --output "D:\python\stable-diffusion-workspace\animation\0001-epg_z-beginning-of-the-universe\concat-20x-TAS-RIFE4.18-ensemble.mp4" --interpolate --interpolate_factor 20 --interpolate_method rife4.18-tensorrt --ensemble --encode_method x264
Processing D:\python\stable-diffusion-workspace\animation\0001-epg_z-beginning-of-the-universe\concat.mp4
UHD and fp16 are not compatible with RIFE, defaulting to fp32
Downloading Rife4.18-tensorrt model |████████████████████████████████████████| 21MB/21MB [100%] in 0.9s
Downloaded Rife4.18-tensorrt model to: C:\Users\przemoc\AppData\Roaming\TheAnimeScripter\weights\rife4.18-tensorrt\rife418_v2_ensembleTrue_op20_clamp_onnxslim.onnx
Engine not found, creating dynamic engine for model: C:\Users\przemoc\AppData\Roaming\TheAnimeScripter\weights\rife4.18-tensorrt\rife418_v2_ensembleTrue_op20_clamp_onnxslim.onnx, this may take a while, but it is worth the wait...
[W] profileSharing0806 is on by default in TensorRT 10.0. This flag is deprecated and has no effect.
[I] Configuring with profiles:[
        Profile 0:
            {input [min=(1, 7, 32, 32), opt=(1, 7, 1280, 1280), max=(1, 7, 2160, 3840)]}
    ]
[E] 3: [builderConfig.cpp::nvinfer1::BuilderConfig::getFlag::67] Error Code 3: API Usage Error (Parameter check failed at: optimizer/api/builderConfig.cpp::nvinfer1::BuilderConfig::getFlag::67, condition: int(builderFlag) >= 0 && int(builderFlag) < EnumMax<BuilderFlag>() )
[E] 3: [builderConfig.cpp::nvinfer1::BuilderConfig::getFlag::67] Error Code 3: API Usage Error (Parameter check failed at: optimizer/api/builderConfig.cpp::nvinfer1::BuilderConfig::getFlag::67, condition: int(builderFlag) >= 0 && int(builderFlag) < EnumMax<BuilderFlag>() )
[I] Building engine with configuration:
    Flags                  | []
    Engine Capability      | EngineCapability.STANDARD
    Memory Pools           | [WORKSPACE: 6143.69 MiB, TACTIC_DRAM: 6143.69 MiB, TACTIC_SHARED_MEMORY: 1024.00 MiB]
    Tactic Sources         | [EDGE_MASK_CONVOLUTIONS, JIT_CONVOLUTIONS]
    Profiling Verbosity    | ProfilingVerbosity.DETAILED
    Preview Features       | [PROFILE_SHARING_0806]
[E] 2: [virtualMemoryBuffer.cpp::nvinfer1::StdVirtualMemoryBufferImpl::resizePhysical::140] Error Code 2: OutOfMemory (no further information)
[W] Requested amount of GPU memory (5315231744 bytes) could not be allocated. There may not be enough free memory for allocation to succeed.
[W] UNSUPPORTED_STATE: Skipping tactic 0 due to insufficient memory on requested size of 5315231744 detected for tactic 0x0000000000000000.
[E] 10: Could not find any implementation for node {ForeignNode[/Constant_105_output_0 + ONNXTRT_Broadcast_466.../Concat_35]}.
[E] 10: [optimizer.cpp::nvinfer1::builder::cgraph::LeafCNode::computeCosts::4105] Error Code 10: Internal Error (Could not find any implementation for node {ForeignNode[/Constant_105_output_0 + ONNXTRT_Broadcast_466.../Concat_35]}.)
[!] Invalid Engine. Please ensure the engine was built correctly
PS C:\Users\przemoc> C:\Tools\TAS\main.exe --input "D:\python\stable-diffusion-workspace\animation\0001-epg_z-beginning-of-the-universe\concat.mp4" --output "D:\python\stable-diffusion-workspace\animation\0001-epg_z-beginning-of-the-universe\concat-20x-TAS-RIFE4.17-ensemble.mp4" --interpolate --interpolate_factor 20 --interpolate_method rife4.17-tensorrt --ensemble --encode_method x264
Processing D:\python\stable-diffusion-workspace\animation\0001-epg_z-beginning-of-the-universe\concat.mp4
UHD and fp16 are not compatible with RIFE, defaulting to fp32
Downloading Rife4.17-tensorrt model |████████████████████████████████████████| 21MB/21MB [100%] in 1.7s
Downloaded Rife4.17-tensorrt model to: C:\Users\przemoc\AppData\Roaming\TheAnimeScripter\weights\rife4.17-tensorrt\rife417_v2_ensembleTrue_op20_clamp_onnxslim.onnx
Engine not found, creating dynamic engine for model: C:\Users\przemoc\AppData\Roaming\TheAnimeScripter\weights\rife4.17-tensorrt\rife417_v2_ensembleTrue_op20_clamp_onnxslim.onnx, this may take a while, but it is worth the wait...
[W] profileSharing0806 is on by default in TensorRT 10.0. This flag is deprecated and has no effect.
[I] Configuring with profiles:[
        Profile 0:
            {input [min=(1, 7, 32, 32), opt=(1, 7, 1280, 1280), max=(1, 7, 2160, 3840)]}
    ]
[E] 3: [builderConfig.cpp::nvinfer1::BuilderConfig::getFlag::67] Error Code 3: API Usage Error (Parameter check failed at: optimizer/api/builderConfig.cpp::nvinfer1::BuilderConfig::getFlag::67, condition: int(builderFlag) >= 0 && int(builderFlag) < EnumMax<BuilderFlag>() )
[E] 3: [builderConfig.cpp::nvinfer1::BuilderConfig::getFlag::67] Error Code 3: API Usage Error (Parameter check failed at: optimizer/api/builderConfig.cpp::nvinfer1::BuilderConfig::getFlag::67, condition: int(builderFlag) >= 0 && int(builderFlag) < EnumMax<BuilderFlag>() )
[I] Building engine with configuration:
    Flags                  | []
    Engine Capability      | EngineCapability.STANDARD
    Memory Pools           | [WORKSPACE: 6143.69 MiB, TACTIC_DRAM: 6143.69 MiB, TACTIC_SHARED_MEMORY: 1024.00 MiB]
    Tactic Sources         | [EDGE_MASK_CONVOLUTIONS, JIT_CONVOLUTIONS]
    Profiling Verbosity    | ProfilingVerbosity.DETAILED
    Preview Features       | [PROFILE_SHARING_0806]
[E] 2: [virtualMemoryBuffer.cpp::nvinfer1::StdVirtualMemoryBufferImpl::resizePhysical::140] Error Code 2: OutOfMemory (no further information)
[W] Requested amount of GPU memory (5315231744 bytes) could not be allocated. There may not be enough free memory for allocation to succeed.
[W] UNSUPPORTED_STATE: Skipping tactic 0 due to insufficient memory on requested size of 5315231744 detected for tactic 0x0000000000000000.
[E] 10: Could not find any implementation for node {ForeignNode[/Constant_105_output_0 + ONNXTRT_Broadcast_470.../Concat_35]}.
[E] 10: [optimizer.cpp::nvinfer1::builder::cgraph::LeafCNode::computeCosts::4105] Error Code 10: Internal Error (Could not find any implementation for node {ForeignNode[/Constant_105_output_0 + ONNXTRT_Broadcast_470.../Concat_35]}.)
[!] Invalid Engine. Please ensure the engine was built correctly

I checked in nvidia-smi and my RTX 2060 w/ 6 GB VRAM should have 5069 MB free, as OS + rest of apps occupy around 964 MB, which gives 6144 - 964 = 5180 MB.

NevermindNilas commented 1 month ago

Since the height is above 1080 ( 1280p in your case ) it has defaulted to FP32 in order to avoid certain issues with Pixelation seen in resolutions above 1920x1080. I am not sure if it applies to 1280x1280 but TAS was hardcoded to work that way. ( If width or height are > 1920 or 1080, switch to UHD Mode / FP32, 1280 > 1080 therefore UDH mode on ).

FP32 uses roughly 2x more VRAM than FP16 therefore it has ended up VRAM OOM-ing, I am not sure why your nvidia-smi didn't show it but it's clearly a VRAM issue coupled with some concat op that isn't supported apparently?.

Sadly I can not provide a fix for now other than either lowering the base resolution to 1080x1080 and / or using CUDA.

przemoc commented 1 month ago

Thank you for the details. Next time I will try with lower resolution. CUDA and NCNN variants were working fine from my short testing (NCNN was slower, though).


My concat.mp4 file was the output of ffmpeg concat, basically combining multiple images into video.

#!/bin/sh

ls -1rt *.png | sed "s,^,file ',;s,$,'," >ffmpeg-concat.txt
ls -1rt *.png | sed "s,^,file ',;s,$,',;1q" >>ffmpeg-concat.txt
ffmpeg -f concat -safe 0 -i ffmpeg-concat.txt -c:v libx264 -crf 0 -pix_fmt yuv444p -vf "settb=AVTB,setpts=N/3/TB,fps=3" -movflags +faststart concat.mp4

I was using concat, as in the past I was also playing with ffmpeg minterpolate as a secondary step.

NevermindNilas commented 1 month ago

I was referring to this part of the logs: "ONNXTRT_Broadcast_470.../Concat_35"

If I remember correctly from my own testing a rule of thumb is.

Tensorrt > Cuda >= DirectML ( if it works ) > NCNN.

So your results are to be expected.

NevermindNilas commented 1 month ago

Should be fixed with https://github.com/NevermindNilas/TheAnimeScripter/commit/e6df139cb5702b0f89ae4db84becba4f23a544e1