data61 / cuda-fixnum

Extended-precision modular arithmetic library that targets CUDA.
Other
41 stars 28 forks source link

Enforce exponent sharing in sliding-window modexp function #41

Open unzvfu opened 6 years ago

unzvfu commented 6 years ago

At the moment the exponent window array is mallocated once per slot (see modexp<...>::modexp(...)), whereas it doesn't make a lot of sense to use the function unless all the exponents in the warp (or even the thread block) are the same.

Also, mallocing all that data is computationally expensive.

Also it might blow the 8MB default heap size, which would require manually managing the heap size from outside the modexp function call, which would be a pain in the neck.

unzvfu commented 4 years ago

Follow up at https://github.com/unzvfu/cuda-fixnum/issues/23.