Fixes the OpenCL busy wait. This may indirectly improve GPU thermals for certain hardware setup (e.g. GPU taking in hot air from CPU), which can lead to slightly better performance.
Makes the OpenCL kernel code more compact with macros, since the code is statically included into the rust binary. Just an OCD thing. If the old expanded style is preferred, I'll change it back.
Makes the OpenCL nonce buffer 32 bits instead of 64 bits to reduce CPU -> GPU memory transfers even more.
Change the leading zeros macro in the OpenCL kernel code to use branching, which is actually very slightly faster.
Overall, should see around 3% gains. May be more if your GPU temps are affected by the CPU usage.
Fixes the OpenCL busy wait. This may indirectly improve GPU thermals for certain hardware setup (e.g. GPU taking in hot air from CPU), which can lead to slightly better performance.
Makes the OpenCL kernel code more compact with macros, since the code is statically included into the rust binary. Just an OCD thing. If the old expanded style is preferred, I'll change it back.
Makes the OpenCL nonce buffer 32 bits instead of 64 bits to reduce CPU -> GPU memory transfers even more.
Change the leading zeros macro in the OpenCL kernel code to use branching, which is actually very slightly faster.
Overall, should see around 3% gains. May be more if your GPU temps are affected by the CPU usage.