Closed imeckler closed 5 years ago
I believe this is now fixed in this branch. Could you try it and let me know if it worked?
Edit: Note that you can now build with $ GENCODES=75 make bench
to specify the desired gen_code.
That command now works, but when you run the bench
program I'm not sure it's working. The output is:
~/cuda-fixnum-fix-build# ./bench/bench
Function: mul_lo, #elts: 0e3
fixnum digit total data time Kops/s
bits bits (MiB) (seconds)
32 32 0.0 0.000 83.3
64 32 0.0 0.000 71.4
128 32 0.0 0.000 90.9
256 32 0.0 0.000 83.3
512 32 0.0 0.000 83.3
1024 32 0.0 0.000 83.3
64 64 0.0 0.000 90.9
128 64 0.0 0.000 83.3
256 64 0.0 0.000 90.9
512 64 0.0 0.000 83.3
1024 64 0.0 0.000 83.3
2048 64 0.0 0.000 76.9
Function: mul_wide, #elts: 0e3
fixnum digit total data time Kops/s
bits bits (MiB) (seconds)
32 32 0.0 0.000 100.0
64 32 0.0 0.000 90.9
128 32 0.0 0.000 90.9
256 32 0.0 0.000 90.9
512 32 0.0 0.000 83.3
1024 32 0.0 0.000 76.9
64 64 0.0 0.000 83.3
128 64 0.0 0.000 90.9
256 64 0.0 0.000 90.9
512 64 0.0 0.000 90.9
1024 64 0.0 0.000 90.9
2048 64 0.0 0.000 71.4
Function: sqr_wide, #elts: 0e3
fixnum digit total data time Kops/s
bits bits (MiB) (seconds)
32 32 0.0 0.000 90.9
64 32 0.0 0.000 90.9
128 32 0.0 0.000 90.9
256 32 0.0 0.000 83.3
512 32 0.0 0.000 90.9
1024 32 0.0 0.000 76.9
64 64 0.0 0.000 90.9
128 64 0.0 0.000 90.9
256 64 0.0 0.000 90.9
512 64 0.0 0.000 83.3
1024 64 0.0 0.000 76.9
2048 64 0.0 0.000 76.9
Function: modexp redc, #elts: 0e3
fixnum digit total data time Kops/s
bits bits (MiB) (seconds)
Segmentation fault (core dumped)
Thanks for following up. That error happens when the number of elements you want to benchmark is not specified on the command line. With any luck I've addressed that in issue #66.
Thank you for all your help. Got it working and the performance is really impressive for multiplication and squaring, modexp redc
however never terminates, let me know if I should create an issue around that.
You're welcome, and yes, please create a new issue for the modexp redc
problem.
Hi! I'm trying to build with CUDA 10.1.168 and edited the makefile to use the appropriate gen_code (which is 75).
make bench
yieldsI believe there is a TODO related to this in
src/fixnum/slot_layout.cu
: