New compute shaders are much slow on macOS

Artoriuz / ArtCNN

Super-Resolution Convolutional Neural Networks as GLSL shaders for mpv

MIT License

109 stars 2 forks source link

New compute shaders are much slow on macOS #16

Closed deus0ww closed 4 months ago

deus0ww commented 4 months ago

Do you plan to generate non-compute shaders? Performance has tanked on macOS for me.

Artoriuz commented 4 months ago

Yeah this is a known issue. The compute shaders are often slower depending on the api/gpu.

I'm currently travelling and I don't remember whether I've saved the trained weights somewhere before leaving, but it's honestly pretty easy to "convert" them yourself in the meanwhile if they're really that bad.

Just remove the COMPUTE directives, replace void with vec4 for the main function of each shader pass, and return the result directly instead of calling imageStore.

The compute shaders that have been released are otherwise identical to their fragment counterparts.

deus0ww commented 4 months ago

Just remove the COMPUTE directives, replace void with vec4 for the main function of each shader pass, and return the result directly instead of calling imageStore.

This worked. Performance is back to normal. Thank you.

Artoriuz commented 4 months ago

I'll keep this open so I remember to look at this again later. I think the current compute shaders are only faster on Vulkan and perhaps only on AMD.

deus0ww commented 4 months ago

I think there's potential for better performance with compute shaders if you use shared sampling. I'll try it when I have time to figure out WorkGroupID, InvocationID, etc...

Artoriuz commented 4 months ago

I think I tried it. Ended up slower than fetching when needed due to how big the shared memory needs to be (I think).

deus0ww commented 4 months ago

I think I tried it. Ended up slower than fetching when needed due to how big the shared memory needs to be (I think).

Do you still have the code for that? Maybe changing the workgroup size will help with shared memory size.

The other thing to try is storing multiple outputs per pass. That should cut down on the number of passes, too. From the libplacebo doc, it seems like it should be possible: https://github.com/haasn/libplacebo/blob/master/docs/custom-shaders.md#storage

Artoriuz commented 4 months ago

This comment has the version with all meme optimisations, including writing to multiple pixels per pass: https://github.com/Artoriuz/ArtCNN/issues/4#issuecomment-2051785754

deus0ww commented 4 months ago

Thank you. ArtCNN_C4F16_DEV.txt should keep me busy for a while...

Artoriuz commented 4 months ago

There are a few caveats with how it's currently laid out (listed in the linked comment). I will eventually get back to it, but I've been a bit busy lately.

Personally I still think the ideal approach is probably using a real ML inference engine. I'll try experimenting with the existing options when I have some time (best option is probably using vs-mlrt via a vs script).

Artoriuz commented 4 months ago

I've reverted the "normal" shaders back to fragment shaders and added the "new" compute shaders as additional options under the compute directory. I'll close this issue for now, but feel free to open another or submit a PR if you have any improvements to share!