bjin / mpv-prescalers

prescalers for mpv, as user shaders
GNU Lesser General Public License v3.0
355 stars 34 forks source link

Try using GL_NV_shader_thread_shuffle #12

Closed haasn closed 5 years ago

haasn commented 7 years ago

https://www.khronos.org/registry/OpenGL/extensions/NV/NV_shader_thread_shuffle.txt

Not sure if AMD/intel implement this too. If not, then it's probably not worth trying.

In theory, this would allow us to directly share samples between threads in the same warp without going through shmem, which should be even faster. I believe the change required would be essentially rewriting the code that loads the samples (float lumaNN = ...) to load them in groups of 32 where each thread loads one value and then uses the warp exchange primitives to directly shuffle them with the other 31 threads.

bjin commented 7 years ago

It looks promising, but I don't have nvidia card (or other card with support of this extension) to work on it at this moment