Open vitduck opened 1 year ago
Hi @vitduck,
as a temporary solution it might be possible to use the solution described here instead of messing around with the X virtual framebuffer stuff, though I haven't tried personally. It should be possible to compile Mesa and LLVMPipe without requiring that they are installed to the system.
We don't have any quick fixes for removing the graphical dependency but it's something we're considering doing in some fashion. It might be possible to simply remove the OpenGL code from the main file, though I think if we pick this task up I'd like to make a second target which builds from a separate main file that has no graphical component.
Duncan.
Hi Duncan,
Thanks for your reply.
I do agree that a second target without graphical component is better than removing OpenGL altogether. Looking at the code, it seems that the rendering is strongly coupled with simulation part. So I am not sure it is worth the effort on your end to isolate it.
For now, I will set up a linux box to test the code.
Hi @vitduck,
We have a PR open that should fix this issue (#30).
I hope this helps!
Hi @DuncanMcBain Thanks very much for the notice.
I am testing the latest commit as follow:
$ module purge
$ module load cuda/10.1
$ sh scripts/build_cuda.sh no_render
-- The CXX compiler identification is GNU 4.8.5
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- The CUDA compiler identification is NVIDIA 10.1.243
-- Check for working CUDA compiler: /apps/cuda/10.1/bin/nvcc
-- Check for working CUDA compiler: /apps/cuda/10.1/bin/nvcc -- works
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Found PkgConfig: /usr/bin/pkg-config (found version "0.27.1")
-- Looking for C++ include pthread.h
-- Looking for C++ include pthread.h - found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE
-- Found CUDA: /apps/cuda/10.1 (found version "10.1")
-- Configuring done
-- Generating done
CMake Warning:
Manually-specified variables were not used by the project:
GLEW_LIBRARY
-- Build files have been written to: /scratch/optpar01/work/2024/cuda-to-sycl-nbody/build_cuda
Scanning dependencies of target nbody_cuda
[ 25%] Building CXX object src/CMakeFiles/nbody_cuda.dir/nbody.cpp.o
[ 50%] Building CXX object src/CMakeFiles/nbody_cuda.dir/sim_param.cpp.o
[ 75%] Building CUDA object src/CMakeFiles/nbody_cuda.dir/simulator.cu.o
[100%] Linking CXX executable ../../nbody_cuda
[100%] Built target nbody_cuda
Scanning dependencies of target release
[100%] Built target release
So OpenGL libs are no longer required!
However, I encounter the following error when running the compiled binary:
$ ./scripts/run_nbody.sh -b cuda 100 10
GPUassert: initialization error /scratch/optpar01/work/2024/cuda-to-sycl-nbody/src/simulator.cuh 94
Looking the the relevant line of simulator.cuh
, it is just a standard cudaMalloc
92 ¦ ParticleData_d(size_t n) {
93 ¦ ¦// Allocate device memory for particle coords & velocity...
94 ¦ ¦gpuErrchk(cudaMalloc((void **)&x, sizeof(coords_t) * n));
95 ¦ ¦gpuErrchk(cudaMalloc((void **)&y, sizeof(coords_t) * n));
96 ¦ ¦gpuErrchk(cudaMalloc((void **)&z, sizeof(coords_t) * n));
97 ¦ };
I tried smaller system size as well, but the error persists (We have 40 GB memory) Do you have some insight on this issue ?
Hi @vitduck,
We won't really be able to help with the pure CUDA version of the code (we didn't write it), but if you're able to try the SYCL version we'd be happy to help with that!
Duncan, Sorry for the an oversight on my part. The aforementioned CUDA error is due to MIG partition. Both CUDA and SYCL-migrated codes can now be built and run without rendering.
Could you kindly confirm if the following output is expected ? (If I understand correctly, the kernel time will be measured in ms)
$ sycl-ls
[opencl:acc:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device 1.2 [2023.16.6.0.22_223734]
[opencl:cpu:1] Intel(R) OpenCL, AMD EPYC 7543 32-Core Processor 3.0 [2023.16.6.0.22_223734]
[ext_oneapi_cuda:gpu:0] NVIDIA CUDA BACKEND, NVIDIA A100-SXM4-80GB 8.8 [CUDA 11.6]
$ ./nbody_cuda 50 10 0.999998 0.005 1.0e-7 2 10000
...
At step 10000 kernel time is 15.4361 and mean is 15.435 and stddev is: 0.0853953
$ SYCL_DEVICE_FILTER=cuda ./nbody_dpcpp 50 10 0.999998 0.005 1.0e-7 2 10000
...
At step 10000 kernel time is 8.60655 and mean is 8.60897 and stddev is: 0.0694211
I would have expected some level of parity between native CUDA and SYCL with a slight edge for the former. Here, the result unexpectedly shows that SYCL/CUDA is two times faster. I am not sure how to interpret this outcome.
Hi @vitduck, so we have a section in the README (the last section) which covers performance and we effectively managed to get the results to be about the same between CUDA and SYCL on a 3060 GPU back when we were working on this. Obviously the software stack has changed since then so it's hard to say exactly what might be similar or different since then.
I'll check with a colleague, we might be able to send you some of our updated numbers, but also you could check with the NVIDIA NSight Compute profiling tool to see if there are any obvious things going on.
Hello,
I've successfully build the CUDA version of the code.
Is it possible to measure performance without relying on OpenGL or Xvfb ? In a public supercomputer environment, it is very difficult to request installation of dependencies required for running the test.
Spack
. However, as mentioned in the repo, running the code through X-tunneling is not recommended.Xvfb
, it is part ofXorg
, and we cannot request installation of these packages in a shared environment.Also, when running CUDA version, the following error is generated on CentOS 7.9
I appreciate if you can provide some suggestion to circumvent these issues.
Regards.