bjin / mpv-prescalers

prescalers for mpv, as user shaders
GNU Lesser General Public License v3.0
355 stars 34 forks source link

SSBOs or UBOs for NNEDI3 weights? #18

Closed haasn closed 6 years ago

haasn commented 6 years ago

I knew we tested UBOs way back in the day but I would be curious to try it again, especially with vulkan etc. Also, instead of UBOs, we should try SSBOs for this sort of stuff. I'd love to hear the results of using either.

I realize this requires changes in mpv, maybe we could work together to try and implement the necessary API?

bjin commented 6 years ago

I thought about this before. The way to upload weights via UBO/SSBO could be similar to user texture. But instead of a sampler, it binds a float array. We could extend current //!TEXTURE to support UBO/SSBO. With array size and array type specified, it should be enough.

BTW. What's new about vulkan that makes UBO/SSBO matters on this? Is there going to be a performance gain?

haasn commented 6 years ago

Rather than an array I would just have you define the buffer layout. The mpv internal code needs this one way or the other. So you can just specify:

//!BUFFER UNIFORM <name>
//!LAYOUT vec4 weights[1000];

or

//!BUFFER STORAGE name
//!LAYOUT int x; float foo[64];

Buffers are completely opaque to the code; they're just binary blobs - just like textures. It will be up to you to make sure the contents of the buffer are formatted correctly (std140 for uniform buffers and std430 for storage buffers).

BTW. What's new about vulkan that makes UBO/SSBO matters on this? Is there going to be a performance gain?

I don't know, but it's useful to test - also it seems like the text-based shaders can confuse the GLSL->SPIR-V translators quite a bit. 256 NNs even segfaults mesa's parser. I also fear for instruction cache locality etc. It seems to me like it would be better to have an in-shader loop instead of unrolling all 256 of those neurons. With an SSBO + array + loop this could be easily doable.

haasn commented 6 years ago

The size of the buffer would be determined by the contents, I guess. Or we could make it explicit, and error if the length of the contents don't match?

bjin commented 6 years ago

How to provide content then? A //!DATA directive?

Or we could make it explicit, and error if the length of the contents don't match?

Length doesn't matter much (unless for security reason). We already have both the length of binary blob and layout string.

bjin commented 6 years ago

Just for the record. Commit in nnedi3-ubo branch implements UBO/SSBO support for nnedi3. For UBO, the performance is like 10% slower for nns=32, and 50% or 100% slower for larger nns. SSBO is even slower than UBO. However, there is size limit for UBO (different for different card, 16k is minimal requirement, which is barely enough for nns=16).

EDIT: relevant mpv change