clawpack / clawutil

General utility programs
BSD 3-Clause "New" or "Revised" License
10 stars 31 forks source link

update cuda version for doppio to use cuda 10.0 #141

Open xinshengqin opened 4 years ago

rjleveque commented 4 years ago

Rather than hardwiring in different CUDA versions for different computer names, would it be better to have a variable CUDA_VERSION or some such thing that a user could set as an environment variable? Most potential users won't have access to doppio or other machines listed in the Makefile.

xinshengqin commented 4 years ago

Updated. How does this look like?

rjleveque commented 4 years ago

Setting

CUDA_VER ?= $(CUDA_VERSION)

means to set CUDA_VER to the value of the environment variable CUDA_VERSION only if CUDA_VER is itself not set as an environment variable. This doesn't seem like what you want since the user wouldn't have two such environment variables.

Maybe you want to first set CUDA_VERSION to some particular version if you are on certain machines and then the above command would over-ride this if the user set an environment variable (which hopefully they did if they are on a different machine than any listed).

And then always do

ALL_FFLAGS   += -Mcuda=$(CUDA_VER),maxregcount:80 -Mcuda=lineinfo -Mcuda=rdc

so that the value set as CUDA_VER is always used regardless of what machine.

mandli commented 4 years ago

Is there not a canonical way to do this? I looked a bit and it looks like you can run nvcc --version as mentioned here.

xinshengqin commented 4 years ago

Is there not a canonical way to do this? I looked a bit and it looks like you can run nvcc --version as mentioned here.

One can do nvcc --version to get something like

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130

But then you have to parse these strings to get what you want. In this case it's "10.0".

Also the cuda compiler does not define an environment variable like CUDA_VERSION for us.

What @rjleveque suggested looks like a simple approach with the caveat that user will need to set this in their bashrc file.

What option do you guys think we should use, 1) parse the string return by "nvcc --version"; 2) use environment variable set by the user?

rjleveque commented 4 years ago

I don't think we should try to parse the version string ourselves. Setting a variable with CUDA_VERSION ?= seems like the best approach to me, setting it to some default value (maybe dependent on machine although that's not useful to most users). And then the user can either set an environment variable to over-ride this, or just set CUDA_VERSION = ... in their own application Makefile which would also over-ride the default set in Makefile.common.gpu, I believe.

One way or another I think the user is responsible for setting this and we should make it easy for them to do so in their own Makefile and/or as an environment variable without having to modify Makefile.common.gpu.

rjleveque commented 4 years ago

Also it looks like there are some other things on the same ALL_FLAGS line as cuda10.1 that are machine-dependent, e.g. cc35 vs cc60?  What is this and does it also need to be an environment variable?

xinshengqin commented 4 years ago

Sounds good. I will go ahead to use this approach. cc35 and cc60 are "compute capabilities". Those can be interpreted as version numbers that indicate what features are supported. Different generations of GPU has different compute capabilities.

More details can be seen at: https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#compute-capabilities

xinshengqin commented 4 years ago

I updated the makefile to use CUDA_VERSION and CUDA_CC.

CUDA_VERSION should be set to the version "nvcc --version" returns. CUDA_CC should be set to the compute capability supported by the machine you compile this code on.

The users are expected to set the two environment variables. They will be passed to the compiler as compilation flags: ALL_FFLAGS += -Mcuda=cuda$(CUDA_VERSION),$(CUDA_CC)

mandli commented 4 years ago

I would be a bit cautious about adding this to something like Makefile.common even thought it is in a “GPU” version. Why not add those variables to the local makefile?

rjleveque commented 4 years ago

Maybe we should have the application point to a new common Makefile for the GPU, e.g.

CLAWMAKE = $(CLAW)/clawutil/src/Makefile.common.gpu

so there could be a common one for gpu applications without messing with the standard one.

Or maybe better to just have in the application Makefile as @mandli suggests, which might make it easier for the user to see exactly what's being set and modify it appropriately.

xinshengqin commented 4 years ago

You mentioned two options for this case: 1) have the application point to a new common Makefile for the GPU. Users need to set variables like CUDA_PATH in their bash environment. 2) have user set variables like CUDA_PATH in their local Makefile.

For option 1, users don't have to set it multiple times for every example. Option 2 is easier for the user to see what's being set or what they need to set.

I can't come up with other reasonings here to compare the two. But this is an important design decision to make.

xinshengqin commented 4 years ago

Any thoughts on which of the two option we should choose?

mandli commented 4 years ago

I tend to use the CUDA_PATH ?= default in the local Makefiles for times like this where you can provide a default value but also override it in the environment.

xinshengqin commented 4 years ago

Just updated it to use variables set in local Makefiles. Relevant changes are also made in: https://github.com/clawpack/amrclaw/pull/259 https://github.com/clawpack/geoclaw/pull/438