Rife v4.14 "lite" version is heavier than regular Rifev4.14

netExtra commented 8 months ago

As the title says. My GPU usage is around 92-98% with Rife v4.15. GPU usage is locked at 100% with Rifev4.14 lite (v1 and v2).

WolframRhodium commented 8 months ago

As has already been said before, this kind of feedback is best reported to the author.

Unlike previous models, RIFE 4.14 lite model uses a kind of operation called grouped convolution, whose performance is usually limited by memory bandwidth rather than computational resources.

netExtra commented 8 months ago

So that would explain why I'm getting lots of Convolution tactic errors I've never seen before. See below.

[01/16/2024-14:50:26] [I] Skipped setting output types for some layers. Check verbose logs for more details. [01/16/2024-14:50:26] [W] [TRT] Could not read timing cache from: C:/Program Files (x86)/SVP 4/rife\models\rife\rife_v4.14_lite.onnx.1920x1088_fp16_no-tf32_workspace8192_trt-9200_cudnn_I-fp16_O-fp16_NVIDIA-GeForce-RTX-4080_8ce99e37.engine.cache. A new timing cache will be generated and written. [01/16/2024-14:50:26] [I] [TRT] Global timing cache in use. Profiling results in this builder pass will be stored. [01/16/2024-14:50:44] [W] [TRT] Cache result detected as invalid for node: /block0/convblock/convblock.1/conv/Conv + block0.convblock.1.beta + /block0/convblock/convblock.1/Mul + /block0/convblock/convblock.1/Add + PWN(/block0/convblock/convblock.1/relu/LeakyRelu), LayerImpl: CaskConvolution, tactic: 0xecff17b04e8a0aaf [01/16/2024-14:50:45] [W] [TRT] Cache result detected as invalid for node: /block0/convblock/convblock.2/conv/Conv + block0.convblock.2.beta + /block0/convblock/convblock.2/Mul + /block0/convblock/convblock.2/Add + PWN(/block0/convblock/convblock.2/relu/LeakyRelu), LayerImpl: CaskConvolution, tactic: 0xecff17b04e8a0aaf [01/16/2024-14:50:45] [W] [TRT] Cache result detected as invalid for node: /block0/convblock/convblock.3/conv/Conv + block0.convblock.3.beta + /block0/convblock/convblock.3/Mul + /block0/convblock/convblock.3/Add + PWN(/block0/convblock/convblock.3/relu/LeakyRelu), LayerImpl: CaskConvolution, tactic: 0xecff17b04e8a0aaf [01/16/2024-14:50:45] [W] [TRT] Cache result detected as invalid for node: /block0/convblock/convblock.4/conv/Conv + block0.convblock.4.beta + /block0/convblock/convblock.4/Mul + /block0/convblock/convblock.4/Add + PWN(/block0/convblock/convblock.4/relu/LeakyRelu), LayerImpl: CaskConvolution, tactic: 0xecff17b04e8a0aaf [01/16/2024-14:50:46] [W] [TRT] Cache result detected as invalid for node: /block0/convblock/convblock.5/conv/Conv + block0.convblock.5.beta + /block0/convblock/convblock.5/Mul + /block0/convblock/convblock.5/Add + PWN(/block0/convblock/convblock.5/relu/LeakyRelu), LayerImpl: CaskConvolution, tactic: 0xecff17b04e8a0aaf [01/16/2024-14:50:46] [W] [TRT] Cache result detected as invalid for node: /block0/convblock/convblock.6/conv/Conv + block0.convblock.6.beta + /block0/convblock/convblock.6/Mul + /block0/convblock/convblock.6/Add + PWN(/block0/convblock/convblock.6/relu/LeakyRelu), LayerImpl: CaskConvolution, tactic: 0xecff17b04e8a0aaf

netExtra commented 8 months ago

As has already been said before, this kind of feedback is best reported to the author.

Unlike previous models, RIFE 4.14 lite model uses a kind of operation called grouped convolution, whose performance is usually limited by memory bandwidth rather than computational resources.

GPU memory bandwidth I assume?

WolframRhodium commented 8 months ago

Yes, GPU memory bandwidth.

AmusementClub / vs-mlrt

Rife v4.14 "lite" version is heavier than regular Rifev4.14 #81