Closed deus0ww closed 4 months ago
Yeah this is a known issue. The compute shaders are often slower depending on the api/gpu.
I'm currently travelling and I don't remember whether I've saved the trained weights somewhere before leaving, but it's honestly pretty easy to "convert" them yourself in the meanwhile if they're really that bad.
Just remove the COMPUTE directives, replace void with vec4 for the main function of each shader pass, and return the result directly instead of calling imageStore.
The compute shaders that have been released are otherwise identical to their fragment counterparts.
Just remove the COMPUTE directives, replace void with vec4 for the main function of each shader pass, and return the result directly instead of calling imageStore.
This worked. Performance is back to normal. Thank you.
I'll keep this open so I remember to look at this again later. I think the current compute shaders are only faster on Vulkan and perhaps only on AMD.
I think there's potential for better performance with compute shaders if you use shared sampling. I'll try it when I have time to figure out WorkGroupID, InvocationID, etc...
I think I tried it. Ended up slower than fetching when needed due to how big the shared memory needs to be (I think).
I think I tried it. Ended up slower than fetching when needed due to how big the shared memory needs to be (I think).
Do you still have the code for that? Maybe changing the workgroup size will help with shared memory size.
The other thing to try is storing multiple outputs per pass. That should cut down on the number of passes, too. From the libplacebo doc, it seems like it should be possible: https://github.com/haasn/libplacebo/blob/master/docs/custom-shaders.md#storage
This comment has the version with all meme optimisations, including writing to multiple pixels per pass: https://github.com/Artoriuz/ArtCNN/issues/4#issuecomment-2051785754
Thank you. ArtCNN_C4F16_DEV.txt should keep me busy for a while...
There are a few caveats with how it's currently laid out (listed in the linked comment). I will eventually get back to it, but I've been a bit busy lately.
Personally I still think the ideal approach is probably using a real ML inference engine. I'll try experimenting with the existing options when I have some time (best option is probably using vs-mlrt via a vs script).
I've reverted the "normal" shaders back to fragment shaders and added the "new" compute shaders as additional options under the compute directory. I'll close this issue for now, but feel free to open another or submit a PR if you have any improvements to share!
Do you plan to generate non-compute shaders? Performance has tanked on macOS for me.