Build failure on cuda 10.1 with RTX 2080

imeckler commented 5 years ago

Hi! I'm trying to build with CUDA 10.1.168 and edited the makefile to use the appropriate gen_code (which is 75).

make bench yields

nvcc -lineinfo -ccbin clang -Wno-deprecated-declarations -std=c++11 -Xcompiler -Wall,-Wextra --gpu-architecture=compute_75 --gpu-code=sm_75 -I./src -lstdc++ -lgtest -o bench/bench bench/bench.cu

ptxas /tmp/tmpxft_00000a18_00000000-5_bench.ptx, line 441; error   : Instruction 'vote' without '.sync' is not supported on .target sm_70 and higher from PTX ISA version 6.4
...MANY_MORE_ERRORS

I believe there is a TODO related to this in src/fixnum/slot_layout.cu:

 * TODO: All of the warp vote and warp shuffle functions will be
 * deprecated in CUDA 9.0 in favour of versions that take a mask
 * selecting relevant lanes in the warp on which to act (see CUDA
 * Programming Guide, B.15). Create an interface that encapsulates
 * both.

unzvfu commented 5 years ago

I believe this is now fixed in this branch. Could you try it and let me know if it worked?

Edit: Note that you can now build with $ GENCODES=75 make bench to specify the desired gen_code.

imeckler commented 5 years ago

That command now works, but when you run the bench program I'm not sure it's working. The output is:

~/cuda-fixnum-fix-build# ./bench/bench 
Function: mul_lo, #elts: 0e3
fixnum digit  total data   time       Kops/s
 bits  bits     (MiB)    (seconds)
   32    32       0.0     0.000          83.3
   64    32       0.0     0.000          71.4
  128    32       0.0     0.000          90.9
  256    32       0.0     0.000          83.3
  512    32       0.0     0.000          83.3
 1024    32       0.0     0.000          83.3

   64    64       0.0     0.000          90.9
  128    64       0.0     0.000          83.3
  256    64       0.0     0.000          90.9
  512    64       0.0     0.000          83.3
 1024    64       0.0     0.000          83.3
 2048    64       0.0     0.000          76.9

Function: mul_wide, #elts: 0e3
fixnum digit  total data   time       Kops/s
 bits  bits     (MiB)    (seconds)
   32    32       0.0     0.000         100.0
   64    32       0.0     0.000          90.9
  128    32       0.0     0.000          90.9
  256    32       0.0     0.000          90.9
  512    32       0.0     0.000          83.3
 1024    32       0.0     0.000          76.9

   64    64       0.0     0.000          83.3
  128    64       0.0     0.000          90.9
  256    64       0.0     0.000          90.9
  512    64       0.0     0.000          90.9
 1024    64       0.0     0.000          90.9
 2048    64       0.0     0.000          71.4

Function: sqr_wide, #elts: 0e3
fixnum digit  total data   time       Kops/s
 bits  bits     (MiB)    (seconds)
   32    32       0.0     0.000          90.9
   64    32       0.0     0.000          90.9
  128    32       0.0     0.000          90.9
  256    32       0.0     0.000          83.3
  512    32       0.0     0.000          90.9
 1024    32       0.0     0.000          76.9

   64    64       0.0     0.000          90.9
  128    64       0.0     0.000          90.9
  256    64       0.0     0.000          90.9
  512    64       0.0     0.000          83.3
 1024    64       0.0     0.000          76.9
 2048    64       0.0     0.000          76.9

Function: modexp redc, #elts: 0e3
fixnum digit  total data   time       Kops/s
 bits  bits     (MiB)    (seconds)
Segmentation fault (core dumped)

unzvfu commented 5 years ago

Thanks for following up. That error happens when the number of elements you want to benchmark is not specified on the command line. With any luck I've addressed that in issue #66.

imeckler commented 5 years ago

Thank you for all your help. Got it working and the performance is really impressive for multiplication and squaring, modexp redc however never terminates, let me know if I should create an issue around that.

unzvfu commented 5 years ago

You're welcome, and yes, please create a new issue for the modexp redc problem.

data61 / cuda-fixnum

Build failure on cuda 10.1 with RTX 2080 #65