Problems Getting MC-GPU Running (Unspecified Launch Failure 4)

GoogleCodeExporter commented 9 years ago

Hello,

I am currently trying to get MC-GPU up and running so that I am able to change 
the code to simulate inverse geometry systems. I am new to linux and running 
into a few problems. I understand that these may be trivial, but it never hurts 
to ask. 

First off, let me explain my situation.

Using a Lenovo y400 Laptop with Nvidia 650m
Running Ubuntu 12.10
Installed Cuda-5.0, all samples compiled and I ran quite a few to test that 
they worked
Using Proprietary driver, not dev-driver that came with Cuda-5.0
Wrote a quick "Hello world" cuda program that compiled and ran
I am compiled with the given lines in the code to create the MC-GPU_v1.3.x and 
run the simple geometry using ../MC-GPU_v1.3.x MC-GPU_v1.3_6voxels.in | tee 
MC-GPU_v1.3_6voxels.out

I can compile and run the simple geometery code using the CPU compilation but 
the GPU part I cannot get to work.

I switch to console, disable the Xserver by calling service lightdm stop and 
init 3
When I try to run the code after doing this, I get all print outs to the point 
of where it states: starting the Monte Carlo Loop Phase and then it tells me 
that I am executing 7813 blocks of 128 threads with 100 histories in each 
thread for a total of 100006400 histories in total. After this output, I get an 
error from line 891 in MC-GPU_v1.3.cu that !!Kernel execution failed while 
simulating particle tracks!!  : (4) unspecified launch failure.

I am assuming that I am getting the error from where the code first tries to 
access the memory of the GPU, but unsure why I am getting these errors. Since I 
am running a sample I am hoping that this is a simple problem of not compiling 
something correctly or missing a step in attempting the simulation. Please let 
me know if anyone has time to help or if I should share any other information.

Thanks,

Dave

Original issue reported on code.google.com by DAPDunke...@gmail.com on 22 Apr 2013 at 7:30

GoogleCodeExporter commented 9 years ago

Hi Dave,

The error you describe is very generic. I would need more information to be 
able to understand what is going on. I can tell you that this same code has 
worked well in many different systems.

This "unspecified launch failure" simply means that the GPU kernel failed. 
Excluding coding errors, the possible causes might be an error in the memory 
allocation, launching too many CUDA threads, a problem in the installation of 
CUDA, &c.

I would recommend first to try to simulate a smaller number of histories 
(reduce the number of threads), and then try to reduce the amount of shared and 
global memory used (reduce the values of MAX_NUM_PROJECTIONS and MAX_MATERIALS 
in MC-GPU_v1.3.h).

Make sure that the allocated shared and constant memories (given during the 
compilation) are not larger than the resources in your GPU.

Did the program work with this modifications?

Original comment by andre...@gmail.com on 29 Apr 2013 at 7:55

Changed state: Accepted

GoogleCodeExporter commented 9 years ago

I did get it to work by modifying the number of threads and particles 
simulated. I think I was limited on the memory available because it is a laptop 
GPU. Thank you very much for the advice!

Original comment by DAPDunke...@gmail.com on 30 Apr 2013 at 3:24

GoogleCodeExporter commented 9 years ago

Good to know it works!
You can simulate as many particles as you want but increase the number of 
particles to be simulated for each thread to make sure that the total number of 
CUDA blocks and threads can be handled by your GPU. 
The code automatically increases the number of particles per thread to keep the 
number of blocks below 65000 but even this may be too much for a laptop GPU.

Original comment by andre...@gmail.com on 30 Apr 2013 at 8:55

Changed state: Answered

GoogleCodeExporter commented 9 years ago

Hi,

I am finding myself in the situation that Dave described here in Issue 15 a 
while ago. When running the sample simulation with 6 voxels I get the same 
error with ‘(4) unspecified launch failure’. I did get it to work when 
decreasing the particles simulated to 10^7 and number of threads to 96, but I 
would like to be able to simulate a larger number of particles.

CUDA 5.0 is installed and seems to be working when running the CUDA samples. 
The computer has 64 GB RAM and a NVIDIA PNY Quadro 4000 2 GB GDDR5 (256 
parallel processing cores).

I would be grateful if you could help me understand if the error depends on a 
limitation in memory or if there is something else that may be the problem.

Thanks!

Original comment by hannie.p...@gmail.com on 14 Oct 2013 at 1:06

GoogleCodeExporter commented 9 years ago

This is a generic error, I can not know what is failing.
Probably you are requesting more resources than the GPU has. 
Reducing the number of threads per block usually solves this (reducing the 
number of histories is not necessary).
You may not have enough shared or constant memory (you can reduce this editing 
the .h files). Look at how much memory the program uses in the output of the 
compilation script.

To understand the resources you are using open the "CUDA Occupancy Calculator" 
Excel spreadsheet provided with CUDA and write the amount of memory and the 
number of threads you use. Then compare with the resources in your GPU.

Another important thing is that you MUST compile the executable for the 
appropriate Compute Capability available in your GPU. Edit the makefile script 
to change the compute capability (more info in the CUDA user guide; run 
./deviceQuery sample code to get all info about your GPU).

Finally, make sure you don't use graphics with the GPU by switching to command 
line mode or (better) plugging the monitor to another GPU. 

Good luck!

Original comment by andre...@gmail.com on 14 Oct 2013 at 5:22

GoogleCodeExporter commented 9 years ago

First, you write that I must edit the makefile script for the appropriate 
compute capability, how do I do that?

And I am afraid I would need more help to understand the Occupancy Calculator 
and use it accurately..  I believe my compute capability is 2.0 and shared 
memory size is 49152 bytes, but I am not sure of how to interpret my resource 
usage from the compilation script output 
(https://www.dropbox.com/s/9wjioonne2dygfc/output.pdf). It says that two entry 
functions are compiled for 'sm_13', 'sm_20' and 'sm_30'. Should the registers 
and shared memory used in the different steps be added together, and should I 
also account for ' an external shared memory array' mentioned (as 2048 bytes) 
in the Help sheet of the Occ. Calculator?

If I state a number of Registers Per Thread larger than 63 (which is the last 
mentioned under 'Function properties' in the compilation output) the occupancy 
is calculated to 0. Do this mean I have to reduce the used registers in some 
way?

When reducing  MAX_MATERIALS in  MC-GPU_v1.3.h I get a lower shared memory 
usage, but I find no effect of reducing MAX_NUM_PROJECTIONS.

If it is correct to use the last compilation output, 63 registers and 3824 
bytes smem (MAX_MATERIALS = 10), and Threads Per Block is set to (for example) 
32, then I get occupancy 17 %. Even if it is 'Limited by Max Warps or Max 
Blocks per Multiprocessor' it means it would work if this was the case, does it 
not?

I apologize if my questions make no sense, I am not really sure of what I am 
asking. Maybe you can think of something that might steer me in the right 
direction. Thank you!

Original comment by hannie.p...@gmail.com on 17 Oct 2013 at 11:32

GoogleCodeExporter commented 9 years ago

The compute capability to be compiled is specified in the Makefile. Currently 
it compiles for 1.3, 2.0 and 3.0 (you can change this by changing the CFLAG 
parameters ending in _13, _20 or _30). This should work well in your GPU.
The shared memory use and occupancy is important to get maximum performance but 
it is not important if you the code does not work at all. 

Since I know that the code is able to work (I have used it many times with many 
different systems and GPUs), the problem must be in the configuration of your 
workstation and CUDA system. 

Try to compile and run a sample CUDA application first, and then try to run 
MC-GPU.
Use a Linux system without X windows, just the command line.

The CUDA documentation is very helpful. You can find the documentation in the 
CUDA installation folder (eg, /usr/local/cuda/doc/pdf/).
Read specially:
   - CUDA_C_Programming_Guide.pdf
   - CUDA_Getting_Started_Guide_For_Microsoft_Windows.pdf (or _For_Linux.pdf)

Original comment by andre...@gmail.com on 17 Oct 2013 at 4:02

FuTY / mcgpu

Problems Getting MC-GPU Running (Unspecified Launch Failure 4) #15