BNLNPPS / esi-g4ox

0 stars 0 forks source link

NVIDIA Nsight Systems and Nsight Compute profiling of GPU code #53

Open ggalgoczi opened 4 days ago

ggalgoczi commented 4 days ago

nsys profile ./build/src/simg4ox -g esi-g4ox/geom/opticks_raindrop.gdml -m esi-g4ox/run.mac

performed profiling. The resulting report file was processed by running

nsys stats report1.nsys-rep

The results SQL file was processed. The results indicate what CUDA functions were called, how much time they took. It can be found here: https://docs.google.com/document/d/1qdaJHzAnp4UDB_VeBLBjykyr1RYRCaV7cBAIpLyNwq8/edit?usp=sharing

Additionally nsys stats report1.nsys-rep command returned output on the Optix calls. Can be found here: https://docs.google.com/document/d/1oxdmNiABB5qCNqublfzElHFmKQxshc6hNuD6Xrkq4pU/edit?usp=sharing

For some reason the resulting SQL file did not contain Optix calls. Needs further investigation.

What is really interesting is that CreateOrReuse_ECPU indeed used the GPU not only the CPU.

Additionally export OPTICKS_INTEGRATION_MODE=2 should choose that optical simulation is only run on CPU. Why is the GPU called then? Only to upload the geometry?

ggalgoczi commented 4 days ago

@plexoos I installed Nsight Compute in one of the images. Essentially you need to download the run file and just run it: https://developer.nvidia.com/tools-overview/nsight-compute/get-started

plexoos commented 4 days ago

That's almost exactly what I did, see https://github.com/BNLNPPS/esi-shell/pull/121/commits/9474f1169e6e469d9f41b262ecb2bcc87ebdf0a7 I don't remember why but I had to create the soft links to run it... Maybe that broke the container. Can you use it without the soft links?

ggalgoczi commented 3 days ago

Yes, Nsight Compute starts but gets stuck:

==PROF== Connected to process 962 (/esi/build/src/simg4ox)
==ERROR== Failed to find metric regex:^LaunchStats\.(sum|min|max|avg|pct|ratio|max_rate)$

==ERROR== Failed to profile "NVIDIA internal (optixAccelBu..." in process 962
==PROF== Trying to shutdown target application
==ERROR== The application returned an error code (9).

The proposed solution is this:

To profile your code with Nsight Compute, enable --generate-line-info and set debugLevel = OPTIX_COMPILE_DEBUG_LEVEL_MODERATE in the OptixModuleCompileOptions in your application host code.

Do you have an idea how to do this?

NVIDIA devs proposed this here:

https://forums.developer.nvidia.com/t/need-help-profiling-an-optix-application/265266

ggalgoczi commented 3 days ago

I found this line in Opticks:

CSGOptiX/OPT.h: else if(strcmp(option, "MODERATE") == 0 ) level = OPTIX_COMPILE_DEBUG_LEVEL_MODERATE ;

plexoos commented 3 days ago

Okay, I'll see if their recommendations help... In the meantime you can try using the new tag -t debug-nsight-compute for the image I just built from this branch https://github.com/BNLNPPS/esi-shell/pull/121. The only difference is that I did not create the soft links.