I'm trying to use generated GPU code for bls12_377 and stumbled accross the function Fq_mul_nvidia, it doesn't work for me (ofc I have Nvidia gpu) while Fq_mul_default works fine. Are there any possible pitfalls of this function? The bls12_377 has 377-bit module len, could it be a crux of the problem? Or does it depends on a type of a limb? I'm using Limb64 since code I'm trying to optimize uses 64-bit limbs.
Hi!
I'm trying to use generated GPU code for
bls12_377
and stumbled accross the functionFq_mul_nvidia
, it doesn't work for me (ofc I have Nvidia gpu) whileFq_mul_default
works fine. Are there any possible pitfalls of this function? Thebls12_377
has 377-bit module len, could it be a crux of the problem? Or does it depends on a type of a limb? I'm usingLimb64
since code I'm trying to optimize uses 64-bit limbs.