Open saltynexus opened 6 years ago
Dear @saltynexus
sorry for the late reply! Thank you very much for the effort you've put into making GPUSPH running on the TX2, and for the extremely detailed report of your results. We have just made the next
branch public. If you're still working on this topic, do you think you could find the time to contribute the necessary changes (starting with the Makefile
fix), and checking if the improvements introduced in the new branch also fix the other issues you've come across?
Currently I do not have a Nvidia GPU installed on my PC, nor do I have the capacity for one. I do however have a Nvidia Jetson TX2, which is an embedded platform hosting a CUDA enabled GPU (compute capability 6.2). On the Jetson, I have CUDA 9.0 installed via JetPack 3.2.1 (see Nvidia's website on embedded systems for more info). Based on the dependencies for running GPUSPH, I believe I meet the necessary requirements. Here are is the device info given by the script:
~/NVIDIA_CUDA-9.0_Samples/1_Utilities/deviceQuery'
I realize this may be the first you've heard of someone trying to run GPUSPH on this type of system and I didn't expect it to run out of the box. Reading through the Makefile supports this assumption, as I don't see anything relating to embedded platforms. That being said, I made an initiative to adjust the Makefile in hopes that I could successfully compile and run the software.
Carefully reading through the Makefile and comparing the execution results of "shell" commands against my system, I was able to ensure that all the necessary "includes" and "libs" were found. The only adjustment that I had to make was here
This adjustment was made because the machine dependent option "-m64" does not exist for AArch64, hence appending the option on this line below causes compile errors
CXXFLAGS += $(TARGET_ARCH)
That being said, there is no machine dependent option being passed to the CXXFLAGS. I did however include the "-m64" option in the beginning of the nvcc-specific flags
CUFLAGS += -m64
Following those adjustments, the code compiles successfully after running
make
Note that I'm following the "default" options for GPUSPH (i.e. dam break problem with defaults) just to see if I can get the software to run. When I run the executable (./GPUSPH), here is the output I receive
As a means to get a better clue as to what cause the failure, I run
cuda-memcheck ./GPUSPH
Note that running this doesn't require any special compile options. Here is the output beginning with "Entering the main simulation cycle" (i.e. the output above, matches that presented above) ,
Note that I've only shown the unique errors and removed those that repeat for the sake of presentation here. Obviously I started with the first error, located at "at 0x00000930 in /home/nvidia/GPUSPH/gpusph/src/cuda/forces_kernel.def:2359". Here is the code (in "forces_kernel.def") where the error is referring to
Since the code was wrapped in an "if statement", I decided to try the alternative, which required that I change the definition in the "textures.cuh" source code to
In other words, I hard coded it such that "PREFER_L1" would always evaluate to false. I read the comments in the code about the L1 cache vs the shared memory, for which I also notice in the source code "cudautili.cu" there is a preference setting. I changed this as well to
Therefore, I'm basically testing the code for the use of share vs L1 memory preference. I run a "make clean" then recompile the code via "make" and everything compiles as before. Running the code now (via ./GPUSPH) succeeds without the errors I was seeing before. Unfortunately, now the simulation blows up with the following output
This is as far as I got before decided to reach out for help. Based on the results from the test above, I believe it might have something to do with the memory. Here's some rudimentary comments concerning the memory on what I've found after some "google research"
here are some further links Jetson TX2 GPU memory L1 cache vs shared memory
Of course, I realize I'm not defining the source of the problem, but I've tried to provide as much info as I've gathered in my effort. I also realize that this is likely not the intended system for this application. I'm mainly interested in resolving this for the purpose of development (again, it's the only Nvidia GPU I have and it's cheap to buy for students ~$300). After testing and development, I would then later run the code on a more dedicated machine with better resources. So if there is anything that can be done to help me resolve these issues, I'd be grateful.
O and as far as host system, here are some of my specs