charmplusplus / charm

The Charm++ parallel programming system. Visit https://charmplusplus.org/ for more information.
Apache License 2.0
207 stars 50 forks source link

CUDA build does not correctly find cuda location #756

Closed mprobson closed 4 years ago

mprobson commented 9 years ago

Original issue: https://charm.cs.illinois.edu/redmine/issues/756


tl;dr we need to change how the build script detects cuda's location OR tell people to ensure that CUDATOOLKIT_HOME is set correctly (which doesn't seem to be a default env var for cuda)

Several builds on the campus cluster were failing when building with the cuda option. I had loaded cuda using:

module load cuda

but the build was still failing. I noticed that at the top of build's output the following line:

[mprobson`taubh2 charm]$ ./build charm++ netlrts-linux-x86_64 cuda smp                                                                                                                                      
checking for CUDA toolkit directory
CUDA_DIR=/usr/local/cuda/

With some grep handiwork:

[mprobson`taubh2 charm]$ grep -rn "checking for CUDA toolkit directory" *                                                                                                                                   
grep: VERSION: No such file or directory
build:451:  echo "checking for CUDA toolkit directory"
grep: include: No such file or directory

And on line 451 of build:

451   echo "checking for CUDA toolkit directory"
452   CUDA_CANDIDATE_DIRS="$CUDATOOLKIT_HOME /usr/local/cuda /usr/lib/nvidia-cuda-toolkit"

Each of those dir's is checked for existence. If they exist then that's where CUDA_DIR is set to. The problem on the campus cluster is that each of the versions of cuda has their own subdir inside /usr/local/cuda/, e.g. /usr/loca/cuda/6.5. This causes the build script to misrecognize cuda and for the build to break. Current work around is to set the non-standard CUDATOOLKIT_HOME env var and then build. I'm ultimately not sure if we need to change the line in build or if we should just ensure users/vendors make sure that variable is set.

mprobson commented 5 years ago

Original date: 2015-08-19 21:09:18


Add the bit about CUDATOOLKIT_HOME to the manual either in the debugging or (newly created) troubleshooting section

ericjbohm commented 5 years ago

Original date: 2016-09-21 18:37:07


is this still a problem?

stwhite91 commented 5 years ago

Original date: 2017-02-01 18:33:55


Any update?

mprobson commented 5 years ago

Original date: 2018-03-14 19:14:09


Renewing an old issue: https://charm.cs.illinois.edu/gerrit/#/c/2048/ https://github.com/UIUC-PPL/charm/commit/2be81b3d1fc354ecb4ad405f8b00ddc9fd4bde00

stwhite91 commented 5 years ago

Original date: 2018-03-15 20:29:26


What does that documentation patch have to do with this issue?

stwhite91 commented 5 years ago

Original date: 2018-03-15 20:32:03


Not a release blocker

mprobson commented 5 years ago

Original date: 2018-03-20 19:38:02


Sam White wrote:

What does that documentation patch have to do with this issue?

From the Description:

tl;dr we need to change how the build script detects cuda's location OR tell people to ensure that CUDATOOLKIT_HOME is set correctly

So this is my first cut fix, i.e. document the problem in the manual. I plan to do the other half of that or as well. One suggestion that I'm currently working on is making the build fail more obviously when it doesn't find cuda.

minitu commented 4 years ago

The current build script looks for nvcc and uses its path to determine the CUDA directory.