bjin / mpv-prescalers

prescalers for mpv, as user shaders
GNU Lesser General Public License v3.0
353 stars 34 forks source link

speedup nnedi3 with cooperative matrix multiplication #59

Open bjin opened 1 year ago

bjin commented 1 year ago

Vulkan 1.3.255 is released with a new vendor neutral extension VK_KHR_cooperative_matrix for tensorcore-like fast matrix multiplication, which could possibly be used to speedup nnedi3. A basic 16x8x8 fp16 coopmatMulAdd is enough. And according to some perf stats I found elsewhere, a 2x to 3x speedup could be expected.

But first, this had to be hold until AMD implemented this extension in their Linux driver (or maybe radv will overcome and implement this first?).

bjin commented 11 months ago

radv(amd): https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24683 anv(intel): https://gitlab.freedesktop.org/mesa/mesa/-/issues/9250

I only have AMD RDNA3(GFX11+) GPU for testing, and according to the RADV PR above, the supported coopMatMul type is 16x16x16 (opcode: v_wmma_f32_16x16x16_f16) with subgroup size of 64. This settings probably won't work on both Intel and nvidia cards.