HiFiLES / HiFiLES-solver

High Fidelity Large Eddy Simulation Solver
Other
172 stars 131 forks source link

error in compilation of GPU version #99

Closed popstar0426 closed 8 years ago

popstar0426 commented 8 years ago

Hi:

I tried to compile HiFiLES in a cluster (which has Tesla k20m installed ). However, I met such a error: nvcc -D_GPU -I/usr/local/cuda-6.0/include/ -gencode=arch=compute_30,code=sm_30 -Xcudafe "--diag_suppress=subscript_out_of_range" -c -o ../src/cuda_kernels.o ../src/cuda_kernels.cu /bin/sh: nvcc: command not found make[1]: * [../src/cuda_kernels.o] Error 127 make[1]: Leaving directory `/export/home/wangy/HiFiLES-solver/obj' make: * [all-recursive] Error 1

My configure_run.sh is attached below.

Basic User-Modifiable Build Settings [Change these as desired]

NODE="GPU" # CPU or GPU CODE="RELEASE" # DEBUG or RELEASE BLAS="NO" # ATLAS, STANDARD, ACCLERATE, or NO PARALLEL="NO" # YES or NO TECIO="NO" # YES or NO METIS="YES" # Build & link to the HiFiLES-supplied ParMETIS libraries? YES or NO

---------------------------------------------------------------

Compiler Selections [Change compilers or add full filepaths if needed]

CXX="g++" # C++ compiler - Typically g++ (default, GNU) or icpc (Intel) NVCC="nvcc" # NVidia CUDA compiler MPICC="mpicxx" # MPI C compiler

---------------------------------------------------------------

Library & Header File Locations [Change filepaths as needed]

BLAS_LIB="/usr/local/atlas/lib" BLAS_INCLUDE="/usr/local/atlas/include"

TECIO_LIB="lib/tecio-2008/lib" TECIO_INCLUDE="lib/tecio-2008/include"

If building the supplied ParMETIS libraries, need the MPI header location

MPI_INCLUDE="/export/home/wangy/software/openmpi/include" # location of mpi.h

If NOT building the supplied ParMetis library, location of installed libraries

PARMETIS_LIB="/usr/local/lib" # location of libparmetis.a PARMETIS_INCLUDE="/usr/local/include" # location of parmetis.h

METIS_LIB="/usr/local/lib" # location of libmetis.a METIS_INCLUDE="/usr/local/include" # location of metis.h

GPU Architechture Selection: -gencode=arch=compute_xx,code=sm_xx (default: 20)

compute_10 Basic features

compute_11 + atomic memory operations on global memory

compute_12 + atomic memory operations on shared memory

+ vote instructions

compute_13 + double precision floating point support

compute_20 + Fermi support

compute_30 + Kepler support

CUDA_ARCH="30" CUDA_LIB="/usr/local/cuda-6.0/lib64/" CUDA_INCLUDE="/usr/local/cuda-6.0/include/"

Best regards!

JacobCrabill commented 8 years ago

here's the key issue in your output:

/bin/sh: nvcc: command not found'

This is your terminal saying it doesn't know what "nvcc" is. You'll need to either add the location of nvcc to your PATH environment variable, or put the full path to nvcc in the makefile.

popstar0426 commented 8 years ago

Hi: Thank you! Now I run simulations with CPU version with command: mpirun -np 4 HiFiLES inputfile. How can I run simulations with GPUs?

Best regards!

mlopez14 commented 8 years ago

Hope this helps!

https://github.com/HiFiLES/HiFiLES-solver/wiki/Execution

Let us know if it doesn't so we can improve it.

On Wed, Oct 7, 2015 at 7:48 PM, popstar0426 notifications@github.com wrote:

Hi: Thank you! Now I run simulations with CPU version with command: mpirun -np 4 HiFiLES inputfile. How can I run simulations with GPUs?

Best regards!

— Reply to this email directly or view it on GitHub https://github.com/HiFiLES/HiFiLES-solver/issues/99#issuecomment-146402997 .

popstar0426 commented 8 years ago

Hi: Questions about the difference of HiFiLES execution between CPU and GPU? As I know, there are many cores in a GPU card. For example, a K20m card have 2496 small cores inside. Question: When I run HiFiLES with "HiFiLES inputfile", only one small core will be used or the total 2496 small cores in one K20m will be used?

Best regards! Yue

mlopez14 commented 8 years ago

The kernels will ask the GPU to perform a compute-intensive task and tell it how to subdivide the task among cores. The actual amount of cores used will depend on the work-load given to the GPU.

In any case, as you run HiFiLES, you don't need to worry about how many GPU cores will be active. Up to all of the cores in the GPU will be used.

In practice, running the GPU version with mpirun -n 1 [...], will use 1 CPU which will control 1 GPU, and this 1 GPU will use its many cores as needed.

popstar0426 commented 8 years ago

Hi: There are 12 K20m GPUs in a cluster. If I run the GPU version with "mpirun -n 12 HiFiLES input_file", does that mean 12 CPUs and 12 GPUs are used in the simulation?The 12 CPUs are asked to control the 12 GPUs, am I right?

Best regards!

mlopez14 commented 8 years ago

correct

On Sat, Oct 10, 2015 at 4:41 PM, popstar0426 notifications@github.com wrote:

Hi: There are 12 K20m GPUs in a cluster. If I run the GPU version with "mpirun -n 12 HiFiLES input_file", does that mean 12 CPUs and 12 GPUs are used in the simulation?The 12 CPUs are asked to control the 12 GPUs, am I right?

Best regards!

— Reply to this email directly or view it on GitHub https://github.com/HiFiLES/HiFiLES-solver/issues/99#issuecomment-147137626 .

popstar0426 commented 8 years ago

Hi:

Thank you. It is much clear now. There was another warning. It related the physical memory. A K20m GPU only has 4Gb physical memory. However, my simulation requires much more than 4Gb memory. Do you have such a warning? How do you solve it? Best regards!

mlopez14 commented 8 years ago

For now, you would have to reduce your simulation size or use more GPUs.

It is possible to stream the information through the GPU and have it process it pieces at a time, but this is still not implemented in HiFiLES.

On Sat, Oct 10, 2015 at 4:47 PM, popstar0426 notifications@github.com wrote:

Hi:

Thank you. It is much clear now. There was another warning. It related the physical memory. A K20m GPU only has 4Gb physical memory. However, my simulation requires much more than 4Gb memory. Do you have such a warning? How do you solve it? Best regards!

— Reply to this email directly or view it on GitHub https://github.com/HiFiLES/HiFiLES-solver/issues/99#issuecomment-147137801 .

popstar0426 commented 8 years ago

Hi: The warning is attached below:

[wangy@gpu02 SD7003]$ mpirun -n 12 /export/home/wangy/HiFiLES-solver/bin/HiFiLES input_sd7003_visc

WARNING: It appears that your OpenFabrics subsystem is configured to only allow registering part of your physical memory. This can cause MPI jobs to run with erratic performance, hang, and/or crash.

This may be caused by your OpenFabrics vendor limiting the amount of physical memory that can be registered. You should investigate the relevant Linux kernel module parameters that control how much physical memory can be registered, and increase them to allow registering all physical memory on your machine.

See this Open MPI FAQ item for more information on these Linux kernel module parameters:

http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages

Local host: gpu02 Registerable memory: 4096 MiB Total memory: 65507 MiB

Your MPI job will continue, but may be behave poorly and/or hang.

_ _ _. | | | | | | | __|| | | | | ___| / | | || | | | | | | | | | | | | (----` | | | | | | | | |__| | | | | \ \ | | | | | | | | | | |`----.| |__.----) | || || || || || |_____|||/

Aerospace Computing Laboratory (Stanford University) my_rank=9 my_rank=1 my_rank=5 my_rank=10 my_rank=11 my_rank=8

---------------------- Non-dimensionalization --------------------- uvw_ref: 34.7128 rho_free_stream: 0.0323339 rho_c_ic=1 u_c_ic=1 v_c_ic=0 w_c_ic=0 mu_c_ic=1.66667e-05 my_rank=0

----------------------- Mesh Preprocessing ------------------------ reading connectivity ... my_rank=2 my_rank=3 my_rank=4 my_rank=6 my_rank=7 done reading connectivity Before parmetis [ 54400 297600 4533 4537] [300] [ 0.000] [ 0.000] [ 29189 220998 2420 2446] [300] [ 0.000] [ 0.000] [ 15706 142906 1295 1320] [300] [ 0.000] [ 0.000] [ 8469 87232 697 716] [300] [ 0.000] [ 0.000] [ 4569 49942 374 388] [300] [ 0.000] [ 0.000] [ 2488 27352 204 212] [300] [ 0.000] [ 0.001] [ 1370 14398 110 120] [300] [ 0.000] [ 0.001] [ 760 7314 59 66] [300] [ 0.000] [ 0.002] [ 427 3620 31 38] [300] [ 0.000] [ 0.004] [ 357 2980 27 34] [300] [ 0.000] [ 0.005] nvtxs: 357, cut: 6349, balance: 1.030 nvtxs: 427, cut: 6050, balance: 1.045 nvtxs: 760, cut: 5905, balance: 1.054 nvtxs: 1370, cut: 5690, balance: 1.050 nvtxs: 2488, cut: 5323, balance: 1.047 nvtxs: 4569, cut: 5063, balance: 1.042 nvtxs: 8469, cut: 4707, balance: 1.034 nvtxs: 15706, cut: 4321, balance: 1.029 nvtxs: 29189, cut: 3917, balance: 1.029 nvtxs: 54400, cut: 3458, balance: 1.027 Setup: Max: 0.005, Sum: 0.060, Balance: 1.004 Matching: Max: 0.005, Sum: 0.060, Balance: 1.003 Contraction: Max: 0.003, Sum: 0.039, Balance: 1.010 InitPart: Max: 0.009, Sum: 0.112, Balance: 1.000 Project: Max: 0.000, Sum: 0.001, Balance: 1.181 Initialize: Max: 0.001, Sum: 0.013, Balance: 1.033 K-way: Max: 0.006, Sum: 0.071, Balance: 1.001 Remap: Max: 0.000, Sum: 0.001, Balance: 1.029 Total: Max: 0.031, Sum: 0.378, Balance: 1.000 Final 12-way Cut: 3458 Balance: 1.027 After parmetis reading vertices done reading vertices Setting up mesh connectivity Done setting up mesh connectivity reading boundary conditions done reading boundary conditions

---------------- Flux Reconstruction Preprocessing ---------------- initializing elements tris quads tets pris hexas done initializing elements setting elements shape ... done. pre-computing nodal shape-basis functions ... [gpu02:23252] 11 more processes have sent help message help-mpi-btl-openib.txt / reg mem limit low [gpu02:23252] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages done.

Question1: Does that warning mean that physical memory is not enough? Question2: 12 K20m GPUs have been used, why the memory is still 4096Mib, not 4096*12=49152Mib? Question3: At the bottom line, 11 more processes have sent message help-mpi-btl-openib.txt / reg mem limit low. What does this mean?

Best regards!

mlopez14 commented 8 years ago

I'm afraid all these warning and error messages are coming from the specific cluster you are using. However, they all seem to point to low memory available.

The total memory message might be including the memory available in the CPUs (it's about 16GB more than expected).

You could try reducing the order in your simulation to see if the message disappears.

On Sat, Oct 10, 2015 at 6:21 PM, popstar0426 notifications@github.com wrote:

Hi: The warning is attached below: [wangy@gpu02 SD7003]$ mpirun -n 12 /export/home/wangy/HiFiLES-solver/bin/HiFiLES input_sd7003_visc

WARNING: It appears that your OpenFabrics subsystem is configured to only allow registering part of your physical memory. This can cause MPI jobs to run with erratic performance, hang, and/or crash.

This may be caused by your OpenFabrics vendor limiting the amount of physical memory that can be registered. You should investigate the relevant Linux kernel module parameters that control how much physical memory can be registered, and increase them to allow registering all physical memory on your machine.

See this Open MPI FAQ item for more information on these Linux kernel module parameters:

http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages

Local host: gpu02 Registerable memory: 4096 MiB Total memory: 65507 MiB Your MPI job will continue, but may be behave poorly and/or hang.

_ _ _. | | | | | | | __|| | | | | ___| / | | || | | | | | | | | | | | | (---- | | | | | | | | |__| | | | | \ \ | | | | | | | | | | |----.| |__.----) | || || || || || |__||__|___/

Aerospace Computing Laboratory (Stanford University) my_rank=9 my_rank=1 my_rank=5 my_rank=10 my_rank=11 my_rank=8

---------------------- Non-dimensionalization --------------------- uvw_ref: 34.7128 rho_free_stream: 0.0323339 rho_c_ic=1 u_c_ic=1 v_c_ic=0 w_c_ic=0 mu_c_ic=1.66667e-05 my_rank=0

----------------------- Mesh Preprocessing ------------------------ reading connectivity ... my_rank=2 my_rank=3 my_rank=4 my_rank=6 my_rank=7 done reading connectivity Before parmetis [ 54400 297600 4533 4537] [300] [ 0.000] [ 0.000] [ 29189 220998 2420 2446] [300] [ 0.000] [ 0.000] [ 15706 142906 1295 1320] [300] [ 0.000] [ 0.000] [ 8469 87232 697 716] [300] [ 0.000] [ 0.000] [ 4569 49942 374 388] [300] [ 0.000] [ 0.000] [ 2488 27352 204 212] [300] [ 0.000] [ 0.001] [ 1370 14398 110 120] [300] [ 0.000] [ 0.001] [ 760 7314 59 66] [300] [ 0.000] [ 0.002] [ 427 3620 31 38] [300] [ 0.000] [ 0.004] [ 357 2980 27 34] [300] [ 0.000] [ 0.005] nvtxs: 357, cut: 6349, balance: 1.030 nvtxs: 427, cut: 6050, balance: 1.045 nvtxs: 760, cut: 5905, balance: 1.054 nvtxs: 1370, cut: 5690, balance: 1.050 nvtxs: 2488, cut: 5323, balance: 1.047 nvtxs: 4569, cut: 5063, balance: 1.042 nvtxs: 8469, cut: 4707, balance: 1.034 nvtxs: 15706, cut: 4321, balance: 1.029 nvtxs: 29189, cut: 3917, balance: 1.029 nvtxs: 54400, cut: 3458, balance: 1.027 Setup: Max: 0.005, Sum: 0.060, Balance: 1.004 Matching: Max: 0.005, Sum: 0.060, Balance: 1.003 Contraction: Max: 0.003, Sum: 0.039, Balance: 1.010 InitPart: Max: 0.009, Sum: 0.112, Balance: 1.000 Project: Max: 0.000, Sum: 0.001, Balance: 1.181 Initialize: Max: 0.001, Sum: 0.013, Balance: 1.033 K-way: Max: 0.006, Sum: 0.071, Balance: 1.001 Remap: Max: 0.000, Sum: 0.001, Balance: 1.029 Total: Max: 0.031, Sum: 0.378, Balance: 1.000 Final 12-way Cut: 3458 Balance: 1.027 After parmetis reading vertices done reading vertices Setting up mesh connectivity Done setting up mesh connectivity reading boundary conditions done reading boundary conditions

---------------- Flux Reconstruction Preprocessing ---------------- initializing elements tris quads tets pris hexas done initializing elements setting elements shape ... done. pre-computing nodal shape-basis functions ... [gpu02:23252] 11 more processes have sent help message help-mpi-btl-openib.txt / reg mem limit low [gpu02:23252] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages done.

Question1: Does that warning mean that physical memory is not enough? Question2: 12 K20m GPUs have been used, why the memory is still 4096Mib, not 4096*12=49152Mib? Question3: At the bottom line, 11 more processes have sent message help-mpi-btl-openib.txt / reg mem limit low. What does this mean?

Best regards!

— Reply to this email directly or view it on GitHub https://github.com/HiFiLES/HiFiLES-solver/issues/99#issuecomment-147144043 .

popstar0426 commented 8 years ago

Hi: The message was still there when I reduced the order to 1. Then I changed to the test case "Taylor-Green-Vortex". The warning was still there. You are right. All the warning comes from the cluster. Question1 is: Will this warning slow down the calculation speed? Question2 is : How to solve this?

Best regards!

mlopez14 commented 8 years ago

I don't think we can help much troubleshoot using your cluster.

1: it would not be possible to run a case larger than you can fit in a cluster 2: run a smaller case

On Sat, Oct 10, 2015 at 6:41 PM, popstar0426 notifications@github.com wrote:

Hi: The message was still there when I reduced the order to 1. Then I changed to the test case "Taylor-Green-Vortex". The warning was still there. You are right. All the warning comes from the cluster. Question1 is: Will this warning slow down the calculation speed? Question2 is : How to solve this?

Best regards!

— Reply to this email directly or view it on GitHub https://github.com/HiFiLES/HiFiLES-solver/issues/99#issuecomment-147144481 .

popstar0426 commented 8 years ago

Hi: Thank you very much.

Best regards!

popstar0426 commented 8 years ago

Hi:

The warning disappeared. However, there is a problem with multiple output. 
When I input "mpirun -n 2 HiFiLES inputfile", the output showed below:
Taylor_Green_vortex]$ mpirun -n 2 HiFiLES input_TGV_SD_hex 

_ _ _. | | | | | | | __|| | | | | ___| / | | || | | | | | | | | | | | | (----` | | | | | | | | |__| | | | | \ \ | | | | | | | | | | |`----.| |__.----) | || || || || || |_____|||/

Aerospace Computing Laboratory (Stanford University) _ _ _. | | | | | | | __|| | | | | ___| / | | || | | | | | | | | | | | | (----` | | | | | | | | |__| | | | | \ \ | | | | | | | | | | |`----.| |__.----) | || || || || || |_____|||/

Aerospace Computing Laboratory (Stanford University)

---------------------- Non-dimensionalization --------------------- uvw_ref: 27.7703 rho_free_stream: 0.00105264 rho_c_ic=1 u_c_ic=0 v_c_ic=0 w_c_ic=0 mu_c_ic=0.000625

----------------------- Mesh Preprocessing ------------------------ reading connectivity ...

---------------------- Non-dimensionalization --------------------- uvw_ref: 27.7703 rho_free_stream: 0.00105264 rho_c_ic=1 u_c_ic=0 v_c_ic=0 w_c_ic=0 mu_c_ic=0.000625

----------------------- Mesh Preprocessing ------------------------ reading connectivity ... done reading connectivity reading vertices done reading connectivity reading vertices done reading vertices done reading vertices Setting up mesh connectivity Setting up mesh connectivity Done setting up mesh connectivity reading boundary conditions Done setting up mesh connectivity reading boundary conditions done reading boundary conditions done reading boundary conditions

---------------- Flux Reconstruction Preprocessing ---------------- initializing elements tris quads tets pris hexas Initializing hexas

---------------- Flux Reconstruction Preprocessing ---------------- initializing elements tris quads tets pris hexas Initializing hexas done initializing elements done initializing elements setting elements shape ... done. pre-computing nodal shape-basis functions ... setting elements shape ... done. pre-computing nodal shape-basis functions ... done. setting element transforms ... at solution points 0.00% done. setting element transforms ... at solution points 0.00% 9.99% 9.99% 19.97% 19.97% 29.96% 29.96% 39.94% 39.94% 49.93% 49.93% 59.91% 59.91% 69.90% 69.90% 79.88% 79.88% 89.87% 89.87% 99.85% at flux points 0.00% 99.85% at flux points 0.00% 9.99% 9.99% 19.97% 19.97% 29.96% 29.96% 39.94% 39.94% 49.93% 49.93% 59.91% 59.91% 69.90% 69.90% 79.88% 79.88% 89.87% 89.87% 99.85% done. initializing grid velocity to 0 ... done. 99.85% done. initializing grid velocity to 0 ... done. setting element transforms at interface cubpts ... done. setting element transforms at volume cubpts ... setting element transforms at interface cubpts ... done. setting element transforms at volume cubpts ... Setting initial conditions... Setting initial conditions... Writing Paraview file TGV_SD_hex_000000000 ... Writing Paraview file TGV_SD_hex_000000000 ... done.

done.

Iter Res[Rho] Res[RhoVelx] Res[RhoVely] Res[RhoVelz] Res[RhoE] Fx_Total Fy_Total Fz_Total 1 0.00007467 0.05070704 0.05070704 0.06454648 0.11999195 0.00000000 0.00000000 0.00000000

Iter Res[Rho] Res[RhoVelx] Res[RhoVely] Res[RhoVelz] Res[RhoE] Fx_Total Fy_Total Fz_Total 1 0.00007467 0.05070704 0.05070704 0.06454648 0.11999195 0.00000000 0.00000000 0.00000000 2 0.00004771 0.05070449 0.05070450 0.06454861 0.11841124 0.00000000 0.00000000 0.00000000 2 0.00004771 0.05070449 0.05070450 0.06454861 0.11841124 0.00000000 0.00000000 0.00000000 3 0.00003346 0.05070269 0.05070269 0.06455042 0.11784944 0.00000000 0.00000000 0.00000000 3 0.00003346 0.05070269 0.05070269 0.06455042 0.11784944 0.00000000 0.00000000 0.00000000 4 0.00003099 0.05070137 0.05070138 0.06455245 0.11741138 0.00000000 0.00000000 0.00000000 4 0.00003099 0.05070137 0.05070138 0.06455245 0.11741138 0.00000000 0.00000000 0.00000000 5 0.00003362 0.05070048 0.05070049 0.06455424 0.11683635 0.00000000 0.00000000 0.00000000 5 0.00003362 0.05070048 0.05070049 0.06455424 0.11683635 0.00000000 0.00000000 0.00000000 6 0.00003527 0.05070013 0.05070012 0.06455563 0.11608773 0.00000000 0.00000000 0.00000000 6 0.00003527 0.05070013 0.05070012 0.06455563 0.11608773 0.00000000 0.00000000 0.00000000 7 0.00003566 0.05070023 0.05070022 0.06455659 0.11515537 0.00000000 0.00000000 0.00000000 7 0.00003566 0.05070023 0.05070022 0.06455659 0.11515537 0.00000000 0.00000000 0.00000000 8 0.00003531 0.05070077 0.05070075 0.06455717 0.11404470 0.00000000 0.00000000 0.00000000 8 0.00003531 0.05070077 0.05070075 0.06455717 0.11404470 0.00000000 0.00000000 0.00000000