BNLNPPS / esi-g4ox

0 stars 0 forks source link

Performance Comparison of Optical Photon Simulation on CPU vs GPU #32

Open plexoos opened 1 month ago

plexoos commented 1 month ago

We need to benchmark the performance of optical photon simulations by comparing the computation times on CPU and GPU architectures. The key metric will be the simulation time as a function of the number of generated photons.

buddhasystem commented 1 month ago

CPU and GPU, both under Opticks, or in different frameworks?

ggalgoczi commented 1 month ago

There are essentially 3 options to run the optical photon simulation. We need to have the first two at least: -- Running pure Geant4 -- Running Opticks on GPU -- Running Opticks on CPU

plexoos commented 1 month ago

Yes, we should focus on leveraging the existing Opticks code for GPU first. I did not get the impression that Mitsuba is easier to work with.

buddhasystem commented 1 month ago

Yes, we should focus on leveraging the existing Opticks code for GPU first. I did not get the impression that Mitsuba is easier to work with.

It's a bit different i.e. we don't have a working interface from G4 to Mitsuba yet, so it's not about the ease of use but even just feasibility. Hope to get to that in time. As to the previous comment, yes, plain G4 vs Opticks seems to be the most useful case to look at.

plexoos commented 1 month ago

@ggalgoczi Do you have any tips on how to enable timing measurements in Opticks?

ggalgoczi commented 1 month ago

For cuda kernels nsight seems to be used by Opticks. Specifically bin/nsight.bash would produce a detailed report that includes the execution time of each GPU kernel and other system-wide performance metrics.

Also nsys is used:

nsys profile -o noprefetch --stats=true ./add_cuda

I did not test it yet. Should we take a look at it next week?

Also I found OpticksProfile class and it seems to profile time and memory usage but it is unclear to me how at this point.

For the total simulation time including overhead I would do something like this:

    auto start = std::chrono::high_resolution_clock::now();
    // Opticks stuff
    auto end = std::chrono::high_resolution_clock::now();
    std::chrono::duration<double> elapsed = end - start;
plexoos commented 1 month ago

I've looked at these too and came to the same conclusion. OptickProfile and stime seem to be in the "dead" code and yet I can't see how they were used to get any useful information.

I did not test it yet. Should we take a look at it next week?

Yes, could you please take a look? Do we need to install nsight-systems-cli to get nsys? Here is my attempt to install it in the container https://github.com/BNLNPPS/esi-shell/pull/121 I just followed the instruction at https://docs.nvidia.com/nsight-systems/InstallationGuide/index.html#package-manager-installation

ggalgoczi commented 1 month ago

In order to perform best profiling the following things are needed, @plexoos could you assist with these? I remember you dug into PTX stuff.

-- set the OptixModuleCompileOptions to OPTIX_COMPILE_DEBUG_LEVEL_MODERATE

If that doesn’t fix it, try setting the environment variable OPTIX_FORCE_DEPRECATED_LAUNCHER only while profiling

From: https://forums.developer.nvidia.com/t/need-help-profiling-an-optix-application/265266/5

ggalgoczi commented 1 month ago

Additionally to nsight systems we need to install nsight compute too.

plexoos commented 1 month ago

Okay, it appears they have a GUI nsys-ui to visualize profiling results. Trying to install and run it...

plexoos commented 1 month ago

Argh... Unfortunately, it fails in the container:

$ nsys-ui 
Warning: Failed to get OpenGL version. OpenGL version 2.0 or higher is required.
OpenGL version is too low (0). Falling back to Mesa software rendering.
/opt/nvidia/nsight-systems/2024.6.1/host-linux-x64/CrashReporter: error while loading shared libraries: libGLX.so.0: cannot open shared object file: No such file or directory
plexoos commented 1 month ago

In order to perform best profiling the following things are needed, @plexoos could you assist with these? I remember you dug into PTX stuff.

-- set the OptixModuleCompileOptions to OPTIX_COMPILE_DEBUG_LEVEL_MODERATE

If that doesn’t fix it, try setting the environment variable OPTIX_FORCE_DEPRECATED_LAUNCHER only while profiling

From: https://forums.developer.nvidia.com/t/need-help-profiling-an-optix-application/265266/5

Yes, I think I know where it should be set... But what exactly did not work for you? What have you tried to run?

ggalgoczi commented 1 month ago

I did not try anything yet on the docker. Could you install nsight compute and nsight systems there?

The GUI that you mentioned do not need to be there. That I tried and used on my own PC.

plexoos commented 1 month ago

Were you able to install nsight compute on your PC?

ggalgoczi commented 1 month ago

That one I did not try yet. I installed nsight systems. For nsight compute I downloaded a .run file for ubuntu but did not run yet.

plexoos commented 1 month ago

Yay! I think I figured out the dependencies for both the nsight-systems and nsight-compute. Both tools and their UIs can be used from the container:

esi-shell -t debug nsys-ui -- -e HOME=$HOME -w $HOME -e DISPLAY=$DISPLAY --net=host
ggalgoczi commented 1 month ago

When I try to open esi-shell with the command provided, the shell opens with the GUI. However once I close the GUI, the shell automatically closes. How can I keep the image open?

plexoos commented 1 month ago

Just don't specify the command and the shell will be interactive, e.g.

esi-shell -t debug -- -e HOME=$HOME -w $HOME -e DISPLAY=$DISPLAY --net=host

In this case you would need to type the command, of course.

ggalgoczi commented 1 month ago

Thanks, still getting errors. Once I open the image and run:

cd $HOME
cmake -S esi-g4ox -B build
cmake --build build

I get:

CMake Error at /opt/spack/opt/spack/linux-ubuntu22.04-x86_64_v3/gcc-11.4.0/cmake-3.27.7-kd6xihnhlzyfwafeilow6t2mvolyryor/share/cmake-3.27/Modules/FindPackageHandleStandardArgs.cmake:230 (message):
  Failed to find XercesC (missing: XercesC_VERSION) (Required is at least
  version "3.2.4")
Call Stack (most recent call first):
  /opt/spack/opt/spack/linux-ubuntu22.04-x86_64_v3/gcc-11.4.0/cmake-3.27.7-kd6xihnhlzyfwafeilow6t2mvolyryor/share/cmake-3.27/Modules/FindPackageHandleStandardArgs.cmake:600 (_FPHSA_FAILURE_MESSAGE)
  /opt/spack/opt/spack/linux-ubuntu22.04-x86_64_v3/gcc-11.4.0/cmake-3.27.7-kd6xihnhlzyfwafeilow6t2mvolyryor/share/cmake-3.27/Modules/FindXercesC.cmake:112 (FIND_PACKAGE_HANDLE_STANDARD_ARGS)
  /opt/spack/opt/spack/linux-ubuntu22.04-x86_64_v3/gcc-11.4.0/cmake-3.27.7-kd6xihnhlzyfwafeilow6t2mvolyryor/share/cmake-3.27/Modules/CMakeFindDependencyMacro.cmake:76 (find_package)
  /opt/spack/opt/spack/linux-ubuntu22.04-x86_64_v3/gcc-11.4.0/geant4-11.1.2-djnxepkcbdv5nknlkhjcn2fs7uqkn5zy/lib/cmake/Geant4/Geant4Config.cmake:311 (find_dependency)
  CMakeLists.txt:31 (find_package)
plexoos commented 1 month ago

Does this work?

esi-shell -t debug 'cd $HOME && cmake -S esi-g4ox -B build && cmake --build build' -- -e HOME=$HOME -w $HOME
ggalgoczi commented 1 month ago

Unfortunately no, I sent the trace in e-mail, do not want to put it here, too long.

plexoos commented 1 month ago

Works for me. I assume you create a new build directory and use the current HEAD in esi-g4ox 666cc0f7

ggalgoczi commented 1 month ago

I repeat the steps I did, maybe I did something wrong, let me know.

I ssh into my account on npps0 and just call the command you shared:

esi-shell -t debug 'cd $HOME && cmake -S esi-g4ox -B build && cmake --build build' -- -e HOME=$HOME -w $HOME

Do I also have to pull the newest github directory to my npps0 home folder? Or what step did I miss?

plexoos commented 1 month ago

Try to delete the existing 'build' directory or use a different name in the above command. Also, I don't know if you have any local changes in your esi-g4ox directory, I am just assuming that your esi-g4ox is at the current HEAD of the main branch.

plexoos commented 1 month ago

Maybe it is not clear from the command but your entire HOME is mounted in the container.

ggalgoczi commented 1 month ago

Tried deleting build and pull newest github repo. Still does not work. Thanks for the idea, I guess when I did some install in my $HOME a while back it interferes with the image. No idea, will resort not mounting it and using:

esi-shell -t debug -- -e DISPLAY=$DISPLAY --net=host

plexoos commented 1 month ago

Here is a completely isolated test:

esi-shell -t debug 'cd /tmp && git clone https://github.com/BNLNPPS/esi-g4ox.git && cmake -S esi-g4ox -B build && cmake --build build'

It must work for everyone with an account on npps0 😕

Also, you can make sure your esi-shell executable matches mine:

[dmitri@npps0:~] 
$ esi-shell --version
1.0.0-583deac
[dmitri@npps0:~] 
$ which esi-shell
/usr/local/bin/esi-shell
ggalgoczi commented 1 month ago

The isolated test works!

Also I get the same:

[galgoczi@npps0 ~]$ esi-shell --version
1.0.0-583deac
[galgoczi@npps0 ~]$ which esi-shell
/usr/local/bin/esi-shell
ggalgoczi commented 2 weeks ago

Managed to get Opticks running on CPU instead of GPU. The magic trick is to call

G4CXOpticks::NoGPU = true;

I put this here for later use when performing testing.

ggalgoczi commented 2 weeks ago

It seems G4CXOpticks photon simulation can not run on the CPU, since in G4CXOpticks::simulate

we have

if(NoGPU) return ;

what confused me was that the geometry translation is done even in this case :)

ggalgoczi commented 1 day ago

Very useful info from Simon:

Comparing A:Opticks and B:Geant4 simulations when using input photons (i.e. the exact same CPU generated photons in both A and B) is a powerful way to find geometry and other issues.

The so-called "record" array records every step point of the photon history. This detailed step history can also be recorded from the Geant4 side using the U4Recorder, allowing recording of the photon histories from Geant4 within Opticks SEvt format NumPy arrays.

Statistical comparisons between the A and B NumPy arrays is the first thing to do for validation.

Going further it is possible to arrange for Geant4 to provide the same set of precooked randoms that curand generates (by replacing the Geant4 "engine" see u4/U4Random.hh) I call that aligned running : it means scatters, reflections, transmissions etc.. all happen at the same places between the simulations. So the resulting arrays can be compared directly, unclouded by statistics.

https://groups.io/g/opticks/message/542