BLAKE3-team / BLAKE3

the official Rust and C implementations of the BLAKE3 cryptographic hash function
Apache License 2.0
5.02k stars 341 forks source link

BLAKE3 in OpenCL #136

Open ian-bruce opened 3 years ago

ian-bruce commented 3 years ago

have you considered implementing BLAKE3 in the OpenCL language, so that high-speed parallel hashing can be run on GPUs? this might be thousands of times faster than computing the same hash on a general-purpose CPU.

would the portable C implementation be suitable for that?

you can rent NVIDIA GPUs in the Amazon cloud, so this might be quite useful for large-scale hashing applications.

https://aws.amazon.com/ec2/instance-types/p2/

the OpenCL specification is here:

https://www.khronos.org/registry/OpenCL/

how hard would it be to convert the algorithm to OpenCL?

oconnor663 commented 3 years ago

@cesarb did some experimental work on a Vulkan implementation here: https://github.com/BLAKE3-team/BLAKE3/pull/80

I don't know enough about OpenCL to say what code might be usable as-is. The bulk of the C implementation is CPU-instruction-set-specific optimized implementations (AVX-512, AVX2, etc.), and those of course aren't going to be portable to the GPU. However the higher level C code in blake3.c might be portable (though again I don't know much about how OpenCL works). It's organized around a hash_many abstraction, which I assume will map well to how the GPU wants to do things. Even if we can't use that code directly, we'd probably take a similar approach in a rewrite.

At the end of the day though, even if we come up with a good GPU implementation, memory read speed is likely to be a bottleneck. The GPU probably won't be able to read bytes from memory as quickly as it can hash them. If you take a look at Figure 4 in the BLAKE3 paper, you can see the point where we start hitting memory bandwidth problems even on a CPU.