cbuchner1 / CudaMiner

a CUDA accelerated litecoin mining application based on pooler's CPU miner
Other
688 stars 303 forks source link

"now compiling for compute 1.0" does not give higher results on Fermi on Ubuntu 12.04 #96

Open biozshock opened 10 years ago

biozshock commented 10 years ago

Have GeForce GTX 560, ubuntu 12.04 and Cuda 5.5. If i change Makefile manually to use -arch=compute_20 then hashrate goes up from approx 95 khash/s to about 115 khash/s. So maybe it worth it to tune this? Both versions run with -i 1 -H 1 -l F11x8 -C 1

vxf commented 10 years ago

Spot-on @biozshock ! I compiled the latest commit with -arch=sm_21 which is equivalent to -arch=compute_21 -code=compute_21,sm_21 instead of the default -arch=compile_10and my GTX 550 Ti is now displaying the best hashrates ever. Now it also autotunes well. I consider this a continuation of #84

cbuchner1 commented 10 years ago

Maybe we can get the best of both worlds and compile for both architectures (compute_10 and sm_20/sm_21) in parallel, so Legacy users can keep using this kernel.

does it make any difference whether one chooses compute_21 or compute_20 in performance?

2014-02-13 1:22 GMT+01:00 Vasco Flores notifications@github.com:

Spot-on biozshock ! I compiled the latest commit with -arch=sm_21 which is equivalent to -arch=compute_21 -code=compute_21,sm_21 instead of the default -arch=compile_10and my GTX 550 Ti is now displaying the best hashrates ever. Now it also autotunes well. I consider this a continuation of #84https://github.com/cbuchner1/CudaMiner/issues/84

Reply to this email directly or view it on GitHubhttps://github.com/cbuchner1/CudaMiner/issues/96#issuecomment-34935312 .

biozshock commented 10 years ago

Compiling now with -arch=sm_21 as there are no -arch=compute_21. As i understand nv_kernel.cu and nv_kernel2.cu are for kepler and titan, right?

Got an error: Too big maxrregcount value specified 64, will be ignored as per doc there are will be no max for these..

Will get back after it runs at least 30-60 minutes.

EDIT: Seems like it's doing a bit better if i set -arch=sm_21 instead of -arch=compute_20. Hashrate almost didn't go up, but it's much more stable. But it's probably because maxrregcount was ignored.

vxf commented 10 years ago

I didn't use -arch=sm_21 for some special reason, just to what my card seemed most fit and checking the syntax at the cuda doc. I may try other settings when I got the time to, if that helps.

biozshock commented 10 years ago

Hm -arch=compute_21 -code=compute_21,sm_21 gives an error here: nvcc fatal : Value 'compute_21' is not defined for option 'gpu-architecture' what cuda do you use to get that?

vxf commented 10 years ago

@biozshock you are right indeed there is no compute_21 defined I guess -arch=sm_21 is a shorthand to -arch=compute_20 -code=compute_20,sm_21 then. I never really went much through on cuda programming anyway :P.

❯ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2013 NVIDIA Corporation
Built on Wed_Jul_17_18:36:13_PDT_2013
Cuda compilation tools, release 5.5, V5.5.0

Makefile:1019

# NOTE: now compiling for compute 1.0 again, as it's using less power and runs way faster on Linux
fermi_kernel.o: fermi_kernel.cu
    $(NVCC) -g -O2 -Xptxas "-abi=no -v" -arch=sm_21 --maxrregcount=64 $(JANSSON_INCLUDES) -o $@ -c $<
ImmortalJ commented 10 years ago

I got a slight performance improvement with this on my GTX 570. Went from 240KH/s to 244KH/s, maybe it makes a bigger difference with smaller launch configurations, but of course I will take the ~4KH/s gain.

Thanks!