ec-gpu
& ec-gpu-gen
CUDA/OpenCL code generator for finite-field arithmetic over prime fields and elliptic curve arithmetic constructed with Rust.
Notes:
Generating CUDA/OpenCL codes for blstrs
Scalar elements:
use blstrs::Scalar;
use ec_gpu_gen::SourceBuilder;
let source = SourceBuilder::new()
.add_field::<Scalar>()
.build_64_bit_limbs();
This crate usually creates GPU kernels at compile-time. CUDA generates a fatbin, which OpenCL only generates the source code, which is then compiled at run-time.
In order to make things easier to use, there are helper functions available. You would put some code into build.rs
, that generates the kernels, and some code into your library which then consumes those generated kernels. The kernels will be directly embedded into your program/library. If something goes wrong, you will get an error at compile-time.
In this example we will make use of the FFT functionality. Add to your build.rs
:
use blstrs::Scalar;
use ec_gpu_gen::SourceBuilder;
fn main() {
let source_builder = SourceBuilder::new().add_fft::<Scalar>()
ec_gpu_gen::generate(&source_builder);
}
The ec_gpu_gen::generate()
takes care of the actual code generation/compilation. It will automatically create a CUDA and/or OpenCL kernel. It will define two environment variables, which are meant for internal use. _EC_GPU_CUDA_KERNEL_FATBIN
that points to the compiled CUDA kernel, and _EC_GPU_OPENCL_KERNEL_SOURCE
that points to the generated OpenCL source.
Those variables are then picked up by the ec_gpu_gen::program!()
macro, which generates a program, for a given GPU device. Using FFT within your library would then look like this:
use ec_gpu_gen::{
rust_gpu_tools::Device,
};
let devices = Device::all();
let programs = devices
.iter()
.map(|device| ec_gpu_gen::program!(device))
.collect::<Result<_, _>>()
.expect("Cannot create programs!");
let mut kern = FftKernel::<Fr>::create(programs).expect("Cannot initialize kernel!");
kern.radix_fft_many(&mut [&mut coeffs], &[omega], &[log_d]).expect("GPU FFT failed!");
This crate supports CUDA and OpenCL, which can be enabled with the cuda
and opencl
feature flags.
EC_GPU_CUDA_NVCC_ARGS
By default the CUDA kernel is compiled for several architectures, which may take a long time. EC_GPU_CUDA_NVCC_ARGS
can be used to override those arguments. The input and output file will still be automatically set.
// Example for compiling the kernel for only the Turing architecture.
EC_GPU_CUDA_NVCC_ARGS="--fatbin --gpu-architecture=sm_75 --generate-code=arch=compute_75,code=sm_75"
EC_GPU_FRAMEWORK
When the library is built with both CUDA and OpenCL support, you can choose which one to use at run time. The default is cuda
, when you set nothing or any other (invalid) value. The other possible value is opencl
.
// Example for setting it to OpenCL.
EC_GPU_FRAMEWORK=opencl
EC_GPU_NUM_THREADS
Restricts the number of threads used in the library. The default is set to the number of logical cores reported on the machine.
// Example for setting the maximum number of threads to 6.
EC_GPU_NUM_THREADS=6
Licensed under either of
at your option.
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.