Closed fiedl closed 6 years ago
The flasher-pulse-width and brightness settings scale with the total number of recorded hits, i.e. scale with the total number of photons (https://github.com/fiedl/hole-ice-study/issues/59#issuecomment-385410216, https://github.com/fiedl/hole-ice-study/issues/57#issuecomment-384267878). They also scale (non linear) with the job run time.
Image source: https://github.com/fiedl/hole-ice-study/issues/57#issuecomment-384267878
I'm extracting the number of photons from clsim propagation log files.
These files contain several bunches of photons:
[2018-05-24 11:25:47] fiedl@wgs16 /afs/ifh.de/group/amanda/scratch/fiedl/hole-ice-study/results/flasher_parameter_scan/esca3.5_r2.0rdom_abs100_width127_bright63
▶ grep photon clsim.log
INFO (I3CLSimStepToPhotonConverterOpenCL): kernel statistics: 622.818 nanoseconds/photon (util: 76%) (NVIDIA CUDA Tesla K80) (I3CLSimStepToPhotonConverterOpenCL.cxx:1210 in boost::posix_time::ptime I3CLSimStepToPhotonConverterOpenCL::DumpStatistics(const cl::Event&, const boost::posix_time::ptime&, uint64_t, bool, const std::string&, const std::string&, uint64_t))
INFO (I3CLSimStepToPhotonConverterOpenCL): Num photons to copy (buffer 0): 1398208 (I3CLSimStepToPhotonConverterOpenCL.cxx:1092 in void I3CLSimStepToPhotonConverterOpenCL::OpenCLThread_impl_downloadPhotons(boost::this_thread::disable_interruption&, bool&, unsigned int, uint32_t))
INFO (I3CLSimStepToPhotonConverterOpenCL): kernel statistics: 622.402 nanoseconds/photon (util: 96%) (NVIDIA CUDA Tesla K80) (I3CLSimStepToPhotonConverterOpenCL.cxx:1210 in boost::posix_time::ptime I3CLSimStepToPhotonConverterOpenCL::DumpStatistics(const cl::Event&, const boost::posix_time::ptime&, uint64_t, bool, const std::string&, const std::string&, uint64_t))
INFO (I3CLSimStepToPhotonConverterOpenCL): Num photons to copy (buffer 0): 1396351 (I3CLSimStepToPhotonConverterOpenCL.cxx:1092 in void I3CLSimStepToPhotonConverterOpenCL::OpenCLThread_impl_downloadPhotons(boost::this_thread::disable_interruption&, bool&, unsigned int, uint32_t))
INFO (I3CLSimStepToPhotonConverterOpenCL): kernel statistics: 623.593 nanoseconds/photon (util: 98%) (NVIDIA CUDA Tesla K80) (I3CLSimStepToPhotonConverterOpenCL.cxx:1210 in boost::posix_time::ptime I3CLSimStepToPhotonConverterOpenCL::DumpStatistics(const cl::Event&, const boost::posix_time::ptime&, uint64_t, bool, const std::string&, const std::string&, uint64_t))
INFO (I3CLSimStepToPhotonConverterOpenCL): Num photons to copy (buffer 0): 1398945 (I3CLSimStepToPhotonConverterOpenCL.cxx:1092 in void I3CLSimStepToPhotonConverterOpenCL::OpenCLThread_impl_downloadPhotons(boost::this_thread::disable_interruption&, bool&, unsigned int, uint32_t))
INFO (I3CLSimStepToPhotonConverterOpenCL): kernel statistics: 623.366 nanoseconds/photon (util: 98%) (NVIDIA CUDA Tesla K80) (I3CLSimStepToPhotonConverterOpenCL.cxx:1210 in boost::posix_time::ptime I3CLSimStepToPhotonConverterOpenCL::DumpStatistics(const cl::Event&, const boost::posix_time::ptime&, uint64_t, bool, const std::string&, const std::string&, uint64_t))
# ...
Sum them up:
[2018-05-24 11:41:34] fiedl@wgs16 /afs/ifh.de/group/amanda/scratch/fiedl/hole-ice-study/results/flasher_pulse_widths/esca0.05_r0.5rdom_abs100_width112
▶ grep photons clsim.log |cut -d " " -f 9 |paste -sd+ |bc
36993469
For different flasher-pulse widths:
[2018-05-24 11:42:56] fiedl@wgs16 /afs/ifh.de/group/amanda/scratch/fiedl/hole-ice-study/results/flasher_pulse_widths
▶ for folder in esca*; do echo $folder; grep photons $folder/clsim.log |cut -d " " -f 9 |paste -sd+ |bc; done
esca0.05_r0.5rdom_abs100_width112
36993469
esca0.05_r0.5rdom_abs100_width127
41876232
esca0.05_r0.5rdom_abs100_width16
5745055
esca0.05_r0.5rdom_abs100_width32
10949845
esca0.05_r0.5rdom_abs100_width48
16158510
esca0.05_r0.5rdom_abs100_width64
21365987
esca0.05_r0.5rdom_abs100_width80
26573831
esca0.05_r0.5rdom_abs100_width96
31791195
simulation_total_number_of_photons_by_width = {
112: 36993469,
127: 41876232,
16: 5745055,
32: 10949845,
48: 16158510,
64: 21365987,
80: 26573831,
96: 31791195
}
Plot those:
[2018-05-24 11:50:06] fiedl@fiedl-mbp ~
▶ which plot
plot () {
plot_expression="$*"
[[ -z $plot_expression ]] && plot_expression="'<cat'"
gnuplot -e "set term png; plot $plot_expression" | ~/.iterm2/imgcat
}
▶ echo "112 36993469\n127 41876232\n16 5745055\n32 10949845\n48 16158510\n64 21365987\n80 26573831\n96 31791195" |plot
The number of propagated photons scales linear with the pulse width setting.
Plot the simulation time vs. the number of propagated photons:
[2018-05-24 12:07:56] fiedl@fiedl-mbp ~/hole-ice-study/scripts/lib master ⚡
▶ python plot_simulation_time_vs_number_of_photons.py
Plot simulation time per photon vs. pulse width:
[2018-05-24 13:28:01] fiedl@fiedl-mbp ~/hole-ice-study/scripts/lib master ⚡
▶ python plot_simulation_time_vs_number_of_photons.py
Simulation time is about 1e6 ns per photon.
This is really strange, because the propagation time is much smaller.
[2018-05-24 13:35:33] fiedl@wgs16 /afs/ifh.de/group/amanda/scratch/fiedl/hole-ice-study/results/flasher_pulse_widths
▶ grep "kernel statistics" esca0.05_r0.5rdom_abs100_width*/clsim.log |head -n 1
esca0.05_r0.5rdom_abs100_width112/clsim.log:INFO (I3CLSimStepToPhotonConverterOpenCL): kernel statistics: 553.535 nanoseconds/photon (util: 76%) (NVIDIA CUDA Tesla K80) (I3CLSimStepToPhotonConverterOpenCL.cxx:1210 in boost::posix_time::ptime I3CLSimStepToPhotonConverterOpenCL::DumpStatistics(const cl::Event&, const boost::posix_time::ptime&, uint64_t, bool, const std::string&, const std::string&, uint64_t))
▶ grep "kernel statistics" esca0.05_r0.5rdom_abs100_width*/clsim.log |cut -d " " -f 5 |shuf |head -n 20
253.643
258.869
253.707
651.095
253.765
254.317
259.274
253.622
653.176
558.418
253.789
554.337
555.549
571.87
259.242
676.258
652.15
674.927
253.72
570.226
The mean is about 500ns for the propagation of each photon (compared to 1e6 ns per photon for the whole simulation):
[2018-05-24 13:39:29] fiedl@wgs16 /afs/ifh.de/group/amanda/scratch/fiedl/hole-ice-study/results/flasher_pulse_widths
▶ grep "kernel statistics" esca0.05_r0.5rdom_abs100_width*/clsim.log |cut -d " " -f 5 |awk '{ total += $1; count++ } END { print total/count }'
476.894
This means the huge performance loss does happen outside the gpu kernel.
I'm running a simple flasher simulation with my clsim vs. standard clsim.
I'm running a flasher simulation with pulse width settings and a brightness settings of 8, 16 and 24.
[2018-05-24 13:39:50] fiedl@wgs16
▶ qrsh -l gpu -l tmpdir_size=10G -l h_rss=8G -l s_rt=11:00:00
[2018-05-24 14:21:47] fiedl@kepler26 /afs/ifh.de/group/amanda/scratch/fiedl/hole-ice-study/scripts/FlasherSimulation
▶ ./run.rb --help
Usage: bundle exec ruby run.rb [options]
--hole-ice=METHOD simulation or approximation
--no-hole-ice Run without hole-ice code
--width=WIDTH Flahser pulse width, min = 0, max = 127
--brightness=BRIGHTNESS Flasher pulse brightness, min = 0, max = 127
--effective-scattering-length=METRES
--absorption-length=METRES
--hole-ice-radius-in-dom-radii=R
--cables Simulate cable shadows
[2018-05-24 14:23:46] fiedl@kepler26 /afs/ifh.de/group/amanda/scratch/fiedl/hole-ice-study/scripts/FlasherSimulation
▶ $ICESIM/env-shell.sh
[2018-05-24 14:35:23] fiedl@kepler26 /afs/ifh.de/group/amanda/scratch/fiedl/hole-ice-study/scripts/FlasherSimulation
▶ rm tmp/*
▶ time ./run.rb --hole-ice=simulation --width=8 --brightness=8 --effective-scattering-length=0.1 --absorption-length=0.1 --hole-ice-radius-in-dom-radii=0.5
0.43s user 0.42s system 2% cpu 29.088 total
[2018-05-24 14:51:06] fiedl@kepler26 /afs/ifh.de/group/amanda/scratch/fiedl/hole-ice-study/scripts/FlasherSimulation
▶ rm tmp/*
▶ time ./run.rb --hole-ice=simulation --width=16 --brightness=16 --effective-scattering-length=0.1 --absorption-length=0.1 --hole-ice-radius-in-dom-radii=0.5
0.42s user 0.37s system 0% cpu 2:10.41 total
[2018-05-24 16:44:36] fiedl@kepler21 /afs/ifh.de/group/amanda/scratch/fiedl/hole-ice-study/scripts/FlasherSimulation
▶ rm tmp/*
▶ time ./run.rb --hole-ice=simulation --width=24 --brightness=24 --effective-scattering-length=100.0 --absorption-length=100.0 --hole-ice-radius-in-dom-radii=0.5
NOTICE (I3Tray): I3Tray finishing... (I3Tray.cxx:473 in void I3Tray::Execute(unsigned int))
INFO (I3CLSimStepToPhotonConverterOpenCL): kernel statistics: 568.109 nanoseconds/photon (util: 87%) (NVIDIA CUDA Tesla K80) (I3CLSimStepToPhotonConverterOpenCL.cxx:1210 in boost::posix_time::ptime I3CLSimStepToPhotonConverterOpenCL::DumpStatistics(const cl::Event &, const boost::posix_time::ptime &, uint64_t, bool, const std::string &, const std::string &, uint64_t))
INFO (I3CLSimStepToPhotonConverterOpenCL): Num photons to copy (buffer 0): 162012 (I3CLSimStepToPhotonConverterOpenCL.cxx:1092 in void I3CLSimStepToPhotonConverterOpenCL::OpenCLThread_impl_downloadPhotons(boost::this_thread::disable_interruption &, bool &, unsigned int, uint32_t))
INFO (I3CLSimStepToPhotonConverterOpenCL): kernel statistics: 588.473 nanoseconds/photon (util: 100%) (NVIDIA CUDA Tesla K80) (I3CLSimStepToPhotonConverterOpenCL.cxx:1210 in boost::posix_time::ptime I3CLSimStepToPhotonConverterOpenCL::DumpStatistics(const cl::Event &, const boost::posix_time::ptime &, uint64_t, bool, const std::string &, const std::string &, uint64_t))
INFO (I3CLSimStepToPhotonConverterOpenCL): Num photons to copy (buffer 0): 155908 (I3CLSimStepToPhotonConverterOpenCL.cxx:1092 in void I3CLSimStepToPhotonConverterOpenCL::OpenCLThread_impl_downloadPhotons(boost::this_thread::disable_interruption &, bool &, unsigned int, uint32_t))
INFO (I3CLSimStepToPhotonConverterOpenCL): kernel statistics: 595.862 nanoseconds/photon (util: 100%) (NVIDIA CUDA Tesla K80) (I3CLSimStepToPhotonConverterOpenCL.cxx:1210 in boost::posix_time::ptime I3CLSimStepToPhotonConverterOpenCL::DumpStatistics(const cl::Event &, const boost::posix_time::ptime &, uint64_t, bool, const std::string &, const std::string &, uint64_t))
INFO (I3CLSimStepToPhotonConverterOpenCL): Num photons to copy (buffer 0): 121164 (I3CLSimStepToPhotonConverterOpenCL.cxx:1092 in void I3CLSimStepToPhotonConverterOpenCL::OpenCLThread_impl_downloadPhotons(boost::this_thread::disable_interruption &, bool &, unsigned int, uint32_t))
INFO (I3CLSimModule): Flushing I3Tray.. (I3CLSimModule.cxx:1639 in virtual void I3CLSimModule::Finish())
0.34s user 0.30s system 0% cpu 12:21.84 total
Setting: 8, runtime: 29s. Setting: 16 (4 times as much photons), runtime: 130s (4.5 times as much simulation time). Setting: 24 (9 times as much photons), runtime: 741s (25.5 times as much simulation time).
How to compile: 2018-01-23 Installing IceSim in Zeuthen
Switch to standard clsim for icesim-v05-00-07:
[2018-05-24 15:17:48] fiedl@wgs16 /afs/ifh.de/group/amanda/scratch/fiedl/clsim
▶ git co icesim-v05-00-07
[2018-05-24 15:00:46] fiedl@wgs16 /afs/ifh.de/group/amanda/scratch/fiedl/software/icecube-simulation-V05-00-07
▶ ll src |grep clsim
lrwxr-xr-x 1 fiedl icecube 44 Jan 25 11:48 clsim -> /afs/ifh.de/group/amanda/scratch/fiedl/clsim
Compile:
[2018-05-24 15:30:18] fiedl@wgs16 /afs/ifh.de/group/amanda/scratch/fiedl/software/icecube-simulation-V05-00-07
▶ rm -r debug_build
▶ mkdir debug_build
[2018-05-24 15:30:52] fiedl@wgs16 /afs/ifh.de/group/amanda/scratch/fiedl/software/icecube-simulation-V05-00-07/debug_build
▶ cmake -D CMAKE_BUILD_TYPE=Debug -D SYSTEM_PACKAGES=true -D CMAKE_BUILD_TYPE:STRING=Debug ../src
▶ ./env-shell.sh
[2018-05-24 15:32:00] fiedl@wgs16 /afs/ifh.de/group/amanda/scratch/fiedl/software/icecube-simulation-V05-00-07/debug_build
▶ make -j 6
Perform simulation with standard clsim:
[2018-05-24 16:33:24] fiedl@kepler21 /afs/ifh.de/group/amanda/scratch/fiedl/hole-ice-study/scripts/FlasherSimulation
▶ rm tmp/*
▶ time ./run.rb --hole-ice=approximation --width=24 --brightness=24 --effective-scattering-length=100.0 --absorption-length=100.0 --hole-ice-radius-in-dom-radii=0.5
NOTICE (I3Tray): I3Tray finishing... (I3Tray.cxx:473 in void I3Tray::Execute(unsigned int))
INFO (I3CLSimStepToPhotonConverterOpenCL): kernel statistics: 37.0069 nanoseconds/photon (util: 30%) (NVIDIA CUDA Tesla K80) (I3CLSimStepToPhotonConverterOpenCL.cxx:1108 in boost::posix_time::ptime I3CLSimStepToPhotonConverterOpenCL::DumpStatistics(const cl::Event &, const boost::posix_time::ptime &, uint64_t, bool, const std::string &, const std::string &, uint64_t))
INFO (I3CLSimStepToPhotonConverterOpenCL): Num photons to copy (buffer 0): 130810 (I3CLSimStepToPhotonConverterOpenCL.cxx:990 in void I3CLSimStepToPhotonConverterOpenCL::OpenCLThread_impl_downloadPhotons(boost::this_thread::disable_interruption &, bool &, unsigned int, uint32_t))
INFO (I3CLSimStepToPhotonConverterOpenCL): kernel statistics: 36.9052 nanoseconds/photon (util: 79%) (NVIDIA CUDA Tesla K80) (I3CLSimStepToPhotonConverterOpenCL.cxx:1108 in boost::posix_time::ptime I3CLSimStepToPhotonConverterOpenCL::DumpStatistics(const cl::Event &, const boost::posix_time::ptime &, uint64_t, bool, const std::string &, const std::string &, uint64_t))
INFO (I3CLSimStepToPhotonConverterOpenCL): Num photons to copy (buffer 0): 124503 (I3CLSimStepToPhotonConverterOpenCL.cxx:990 in void I3CLSimStepToPhotonConverterOpenCL::OpenCLThread_impl_downloadPhotons(boost::this_thread::disable_interruption &, bool &, unsigned int, uint32_t))
INFO (I3CLSimStepToPhotonConverterOpenCL): kernel statistics: 35.8627 nanoseconds/photon (util: 99%) (NVIDIA CUDA Tesla K80) (I3CLSimStepToPhotonConverterOpenCL.cxx:1108 in boost::posix_time::ptime I3CLSimStepToPhotonConverterOpenCL::DumpStatistics(const cl::Event &, const boost::posix_time::ptime &, uint64_t, bool, const std::string &, const std::string &, uint64_t))
INFO (I3CLSimStepToPhotonConverterOpenCL): Num photons to copy (buffer 0): 95246 (I3CLSimStepToPhotonConverterOpenCL.cxx:990 in void I3CLSimStepToPhotonConverterOpenCL::OpenCLThread_impl_downloadPhotons(boost::this_thread::disable_interruption &, bool &, unsigned int, uint32_t))
INFO (I3CLSimModule): Flushing I3Tray.. (I3CLSimModule.cxx:1588 in virtual void I3CLSimModule::Finish())
0.31s user 0.28s system 0% cpu 2:02.91 total
Switch back to hole-ice clsim:
[2018-05-24 16:39:49] fiedl@wgs16 /afs/ifh.de/group/amanda/scratch/fiedl/clsim
▶ git co sf/hole-ice-2018
[2018-05-24 16:40:29] fiedl@wgs16 /afs/ifh.de/group/amanda/scratch/fiedl/software/icecube-simulation-V05-00-07/debug_build
▶ ./env-shell.sh
▶ make
With the same flasher settings and large scattering and absorption lengths on the same hardware:
This does not look good.
I'm not convinced that the number of downloaded photons is the same as the number of propagated photons. If only hits are written down, then the number of propagated photons is much larger.
If a large number of medium changes is responsible for the performance loss, then this simulation will be useless in a real scenario where all cylinders and cables are implemented in the geometry.
If ppc is faster, what is ppc doing differently?
Configuration:
strings_with_hole_ice = [63]+ [62, 54, 55, 64, 71, 70] + [61, 53, 44, 45, 46, 56, 65, 72, 78, 77, 76, 69]
strings_with_hole_ice = [63]
Run flasher simulation:
[2018-05-24 17:55:29] fiedl@kepler21 /afs/ifh.de/group/amanda/scratch/fiedl/hole-ice-study/scripts/FlasherSimulation
▶ rm tmp/*
▶ time ./run.rb --hole-ice=simulation --width=24 --brightness=24 --effective-scattering-length=100.0 --absorption-length=100.0 --hole-ice-radius-in-dom-radii=0.5
# ...
NOTICE (I3Tray): I3Tray finishing... (I3Tray.cxx:473 in void I3Tray::Execute(unsigned int))
INFO (I3CLSimStepToPhotonConverterOpenCL): kernel statistics: 544.713 nanoseconds/photon (util: 86%) (NVIDIA CUDA Tesla K80) (I3CLSimStepToPhotonConverterOpenCL.cxx:1210 in boost::posix_time::ptime I3CLSimStepToPhotonConverterOpenCL::DumpStatistics(const cl::Event &, const boost::posix_time::ptime &, uint64_t, bool, const std::string &, const std::string &, uint64_t))
INFO (I3CLSimStepToPhotonConverterOpenCL): Num photons to copy (buffer 0): 162068 (I3CLSimStepToPhotonConverterOpenCL.cxx:1092 in void I3CLSimStepToPhotonConverterOpenCL::OpenCLThread_impl_downloadPhotons(boost::this_thread::disable_interruption &, bool &, unsigned int, uint32_t))
INFO (I3CLSimStepToPhotonConverterOpenCL): kernel statistics: 562.101 nanoseconds/photon (util: 100%) (NVIDIA CUDA Tesla K80) (I3CLSimStepToPhotonConverterOpenCL.cxx:1210 in boost::posix_time::ptime I3CLSimStepToPhotonConverterOpenCL::DumpStatistics(const cl::Event &, const boost::posix_time::ptime &, uint64_t, bool, const std::string &, const std::string &, uint64_t))
INFO (I3CLSimStepToPhotonConverterOpenCL): Num photons to copy (buffer 0): 155938 (I3CLSimStepToPhotonConverterOpenCL.cxx:1092 in void I3CLSimStepToPhotonConverterOpenCL::OpenCLThread_impl_downloadPhotons(boost::this_thread::disable_interruption &, bool &, unsigned int, uint32_t))
INFO (I3CLSimStepToPhotonConverterOpenCL): kernel statistics: 569.499 nanoseconds/photon (util: 100%) (NVIDIA CUDA Tesla K80) (I3CLSimStepToPhotonConverterOpenCL.cxx:1210 in boost::posix_time::ptime I3CLSimStepToPhotonConverterOpenCL::DumpStatistics(const cl::Event &, const boost::posix_time::ptime &, uint64_t, bool, const std::string &, const std::string &, uint64_t))
INFO (I3CLSimStepToPhotonConverterOpenCL): Num photons to copy (buffer 0): 121236 (I3CLSimStepToPhotonConverterOpenCL.cxx:1092 in void I3CLSimStepToPhotonConverterOpenCL::OpenCLThread_impl_downloadPhotons(boost::this_thread::disable_interruption &, bool &, unsigned int, uint32_t))
INFO (I3CLSimModule): Flushing I3Tray.. (I3CLSimModule.cxx:1639 in virtual void I3CLSimModule::Finish())
# ...
0.29s user 0.26s system 0% cpu 12:30.89 total
The number of cylinder appears to play a subdominant role.
Configuration:
strings_with_hole_ice = [63]
Comment out ice layers in clsim/resources/kernels/lib/propagation_through_media
// add_ice_layers_on_photon_path_to_medium_changes(
// photonPosAndTime,
// photonDirAndWlen,
// photonRange,
//
// // These values will be updates within this function:
// &number_of_medium_changes,
// distances_to_medium_changes,
// local_scattering_lengths,
// local_absorption_lengths
// );
Run the simulation:
[2018-05-25 09:13:58] fiedl@wgs16 ~
▶ qrsh -l gpu -l tmpdir_size=10G -l h_rss=8G -l s_rt=14:00:00
[2018-05-25 09:14:19] fiedl@kepler19 ~
▶ $ICESIM/env-shell.sh
[2018-05-25 09:23:06] fiedl@kepler19 /afs/ifh.de/group/amanda/scratch/fiedl/hole-ice-study/scripts/FlasherSimulation
▶ time ./run.rb --hole-ice=simulation --width=24 --brightness=24 --effective-scattering-length=100.0 --absorption-length=100.0 --hole-ice-radius-in-dom-radii=0.5
NOTICE (I3Tray): I3Tray finishing... (I3Tray.cxx:473 in void I3Tray::Execute(unsigned int))
INFO (I3CLSimStepToPhotonConverterOpenCL): kernel statistics: 598.538 nanoseconds/photon (util: 82%) (NVIDIA CUDA Tesla K80) (I3CLSimStepToPhotonConverterOpenCL.cxx:1210 in boost::posix_time::ptime I3CLSimStepToPhotonConverterOpenCL::DumpStatistics(const cl::Event &, const boost::posix_time::ptime &, uint64_t, bool, const std::string &, const std::string &, uint64_t))
INFO (I3CLSimStepToPhotonConverterOpenCL): Num photons to copy (buffer 0): 167356 (I3CLSimStepToPhotonConverterOpenCL.cxx:1092 in void I3CLSimStepToPhotonConverterOpenCL::OpenCLThread_impl_downloadPhotons(boost::this_thread::disable_interruption &, bool &, unsigned int, uint32_t))
INFO (I3CLSimStepToPhotonConverterOpenCL): kernel statistics: 606.615 nanoseconds/photon (util: 100%) (NVIDIA CUDA Tesla K80) (I3CLSimStepToPhotonConverterOpenCL.cxx:1210 in boost::posix_time::ptime I3CLSimStepToPhotonConverterOpenCL::DumpStatistics(const cl::Event &, const boost::posix_time::ptime &, uint64_t, bool, const std::string &, const std::string &, uint64_t))
INFO (I3CLSimStepToPhotonConverterOpenCL): Num photons to copy (buffer 0): 161083 (I3CLSimStepToPhotonConverterOpenCL.cxx:1092 in void I3CLSimStepToPhotonConverterOpenCL::OpenCLThread_impl_downloadPhotons(boost::this_thread::disable_interruption &, bool &, unsigned int, uint32_t))
INFO (I3CLSimStepToPhotonConverterOpenCL): kernel statistics: 608.811 nanoseconds/photon (util: 100%) (NVIDIA CUDA Tesla K80) (I3CLSimStepToPhotonConverterOpenCL.cxx:1210 in boost::posix_time::ptime I3CLSimStepToPhotonConverterOpenCL::DumpStatistics(const cl::Event &, const boost::posix_time::ptime &, uint64_t, bool, const std::string &, const std::string &, uint64_t))
INFO (I3CLSimStepToPhotonConverterOpenCL): Num photons to copy (buffer 0): 123460 (I3CLSimStepToPhotonConverterOpenCL.cxx:1092 in void I3CLSimStepToPhotonConverterOpenCL::OpenCLThread_impl_downloadPhotons(boost::this_thread::disable_interruption &, bool &, unsigned int, uint32_t))
INFO (I3CLSimModule): Flushing I3Tray.. (I3CLSimModule.cxx:1639 in virtual void I3CLSimModule::Finish())
0.36s user 0.29s system 0% cpu 14:21.80 total
This cluster worker seemed overall slow, also when creating the geometry.
Configuration:
Run the flasher simulation:
[2018-05-25 09:57:34] fiedl@kepler19 /afs/ifh.de/group/amanda/scratch/fiedl/hole-ice-study/scripts/FlasherSimulation
▶ time ./run.rb --hole-ice=simulation --width=24 --brightness=24 --effective-scattering-length=100.0 --absorption-length=100.0 --hole-ice-radius-in-dom-radii=0.5
NOTICE (I3Tray): I3Tray finishing... (I3Tray.cxx:473 in void I3Tray::Execute(unsigned int))
INFO (I3CLSimStepToPhotonConverterOpenCL): kernel statistics: 22.3237 nanoseconds/photon (util: 15%) (NVIDIA CUDA Tesla K80) (I3CLSimStepToPhotonConverterOpenCL.cxx:1210 in boost::posix_time::ptime I3CLSimStepToPhotonConverterOpenCL::DumpStatistics(const cl::Event &, const boost::posix_time::ptime &, uint64_t, bool, const std::string &, const std::string &, uint64_t))
INFO (I3CLSimStepToPhotonConverterOpenCL): Num photons to copy (buffer 0): 167503 (I3CLSimStepToPhotonConverterOpenCL.cxx:1092 in void I3CLSimStepToPhotonConverterOpenCL::OpenCLThread_impl_downloadPhotons(boost::this_thread::disable_interruption &, bool &, unsigned int, uint32_t))
INFO (I3CLSimStepToPhotonConverterOpenCL): kernel statistics: 22.2416 nanoseconds/photon (util: 40%) (NVIDIA CUDA Tesla K80) (I3CLSimStepToPhotonConverterOpenCL.cxx:1210 in boost::posix_time::ptime I3CLSimStepToPhotonConverterOpenCL::DumpStatistics(const cl::Event &, const boost::posix_time::ptime &, uint64_t, bool, const std::string &, const std::string &, uint64_t))
INFO (I3CLSimStepToPhotonConverterOpenCL): Num photons to copy (buffer 0): 161078 (I3CLSimStepToPhotonConverterOpenCL.cxx:1092 in void I3CLSimStepToPhotonConverterOpenCL::OpenCLThread_impl_downloadPhotons(boost::this_thread::disable_interruption &, bool &, unsigned int, uint32_t))
INFO (I3CLSimStepToPhotonConverterOpenCL): kernel statistics: 21.6648 nanoseconds/photon (util: 36%) (NVIDIA CUDA Tesla K80) (I3CLSimStepToPhotonConverterOpenCL.cxx:1210 in boost::posix_time::ptime I3CLSimStepToPhotonConverterOpenCL::DumpStatistics(const cl::Event &, const boost::posix_time::ptime &, uint64_t, bool, const std::string &, const std::string &, uint64_t))
INFO (I3CLSimStepToPhotonConverterOpenCL): Num photons to copy (buffer 0): 123383 (I3CLSimStepToPhotonConverterOpenCL.cxx:1092 in void I3CLSimStepToPhotonConverterOpenCL::OpenCLThread_impl_downloadPhotons(boost::this_thread::disable_interruption &, bool &, unsigned int, uint32_t))
INFO (I3CLSimModule): Flushing I3Tray.. (I3CLSimModule.cxx:1639 in virtual void I3CLSimModule::Finish())
0.30s user 0.28s system 0% cpu 2:34.29 total
This is much faster
These results suggest that the new code for medium changes takes 30 times as long as standard clsim.
But earlier performance measurements didn't indicate this at all:
Trying to narrow down where the performance is lost.
Hole Ice Layers Sorting Result
0 0 0 fast
1 0 1 slow
1 1 1 slow
0 1 0 slow
1 0 0 slow
0 0 1 fast
If hole ice or layers are on, then it's slow. This suggests that the performance loss happens if there are medium changes.
Adding an arbitrary medium change by hand:
int number_of_medium_changes = 1;
floating_t distances_to_medium_changes[MEDIUM_LAYERS] = {0.0, 1.0};
int currentPhotonLayer = min(max(findLayerForGivenZPos(photonPosAndTime.z), 0), MEDIUM_LAYERS-1);
floating_t local_scattering_lengths[MEDIUM_LAYERS] = {getScatteringLength(currentPhotonLayer, photonDirAndWlen.w), 100.0};
floating_t local_absorption_lengths[MEDIUM_LAYERS] = {getAbsorptionLength(currentPhotonLayer, photonDirAndWlen.w), 100.0};
Hole Ice Layers Sorting Result
0 0 0 +arbitrary medium change fast
0 0 1 +arbitrary medium change fast
Why is the arbitrary medium change fast and one hole-ice cylinder slow?
Hole Ice Layers Sorting Result
1 0 1 slow
0 0 1 +arbitrary medium change fast
What if adding the hole-ice cylinder but re-setting to the stipulative values after that?
#ifdef HOLE_ICE
add_hole_ice_cylinders_on_photon_path_to_medium_changes(
photonPosAndTime,
photonDirAndWlen,
photonRange,
numberOfCylinders,
cylinderPositionsAndRadii,
// These values will be updates within this function:
&number_of_medium_changes,
distances_to_medium_changes,
local_scattering_lengths,
local_absorption_lengths
);
#endif
number_of_medium_changes = 1;
distances_to_medium_changes[0] = 0.0;
distances_to_medium_changes[1] = 1.0;
currentPhotonLayer = min(max(findLayerForGivenZPos(photonPosAndTime.z), 0), MEDIUM_LAYERS-1);
local_scattering_lengths[0] = getScatteringLength(currentPhotonLayer, photonDirAndWlen.w);
local_scattering_lengths[1] = 100.0;
local_absorption_lengths[0] = getAbsorptionLength(currentPhotonLayer, photonDirAndWlen.w);
local_absorption_lengths[1] = 100.0;
This is interesting: It's not as fast as without hole ice, but it's also not slow: 48 ns per photon.
NOTICE (I3Tray): I3Tray finishing... (I3Tray.cxx:473 in void I3Tray::Execute(unsigned int))
INFO (I3CLSimStepToPhotonConverterOpenCL): kernel statistics: 43.7711 nanoseconds/photon (util: 31%) (NVIDIA CUDA Tesla K80) (I3CLSimStepToPhotonConverterOpenCL.cxx:1210 in boost::posix_time::ptime I3CLSimStepToPhotonConverterOpenCL::DumpStatistics(const cl::Event &, const boost::posix_time::ptime &, uint64_t, bool, const std::string &, const std::string &, uint64_t))
INFO (I3CLSimStepToPhotonConverterOpenCL): Num photons to copy (buffer 0): 127445 (I3CLSimStepToPhotonConverterOpenCL.cxx:1092 in void I3CLSimStepToPhotonConverterOpenCL::OpenCLThread_impl_downloadPhotons(boost::this_thread::disable_interruption &, bool &, unsigned int, uint32_t))
INFO (I3CLSimStepToPhotonConverterOpenCL): kernel statistics: 49.148 nanoseconds/photon (util: 81%) (NVIDIA CUDA Tesla K80) (I3CLSimStepToPhotonConverterOpenCL.cxx:1210 in boost::posix_time::ptime I3CLSimStepToPhotonConverterOpenCL::DumpStatistics(const cl::Event &, const boost::posix_time::ptime &, uint64_t, bool, const std::string &, const std::string &, uint64_t))
INFO (I3CLSimStepToPhotonConverterOpenCL): Num photons to copy (buffer 0): 132378 (I3CLSimStepToPhotonConverterOpenCL.cxx:1092 in void I3CLSimStepToPhotonConverterOpenCL::OpenCLThread_impl_downloadPhotons(boost::this_thread::disable_interruption &, bool &, unsigned int, uint32_t))
INFO (I3CLSimStepToPhotonConverterOpenCL): kernel statistics: 51.7027 nanoseconds/photon (util: 100%) (NVIDIA CUDA Tesla K80) (I3CLSimStepToPhotonConverterOpenCL.cxx:1210 in boost::posix_time::ptime I3CLSimStepToPhotonConverterOpenCL::DumpStatistics(const cl::Event &, const boost::posix_time::ptime &, uint64_t, bool, const std::string &, const std::string &, uint64_t))
INFO (I3CLSimStepToPhotonConverterOpenCL): Num photons to copy (buffer 0): 106994 (I3CLSimStepToPhotonConverterOpenCL.cxx:1092 in void I3CLSimStepToPhotonConverterOpenCL::OpenCLThread_impl_downloadPhotons(boost::this_thread::disable_interruption &, bool &, unsigned int, uint32_t))
INFO (I3CLSimModule): Flushing I3Tray.. (I3CLSimModule.cxx:1639 in virtual void I3CLSimModule::Finish())
Hole Ice Layers Sorting Result
1 0 1 slow
0 0 1 +arbitrary medium change fast
1 0 1 +reset medium change somewhat fast
What if adding hole ice, without arbitrary medium change, but skipping loop_over_media_and_calculate_geometrical_distances_up_to_the_next_scattering_point
, replacing it with a less complicated calculation.
// loop_over_media_and_calculate_geometrical_distances_up_to_the_next_scattering_point(
// number_of_medium_changes,
// distances_to_medium_changes,
// local_scattering_lengths,
// local_absorption_lengths,
//
// // These values will be updates within this function:
// sca_step_left,
// abs_lens_left,
// distancePropagated,
// distanceToAbsorption
// );
// Spend the rest of the budget with the last medium properties.
if (*sca_step_left > 0) {
*distancePropagated += *sca_step_left * local_scattering_lengths[number_of_medium_changes];
*distanceToAbsorption += *abs_lens_left * local_absorption_lengths[number_of_medium_changes];
*abs_lens_left -= my_divide(*distancePropagated, local_absorption_lengths[number_of_medium_changes]);
}
// If the photon is absorbed, only propagate up to the absorption point.
if (*distanceToAbsorption < *distancePropagated) {
*distancePropagated = *distanceToAbsorption;
*distanceToAbsorption = ZERO;
*abs_lens_left = ZERO;
}
Hole Ice Layers Sorting Result
1 0 1 simple calculation slow
I'm trying some kernel profiling in https://github.com/fiedl/hole-ice-study/issues/69.
I'm not convinced that the downloaded number of photons is the total number of propagated photons. Depending on the configuration, only photons that hit a DOM need to be downloaded.
I'm repeating a small flasher simulation on the cluster with that new output in order to get the scaling factor of the number of downloaded photons and the total number of propagated photons.
[2018-05-26 11:03:59] fiedl@wgs16 ~
▶ qrsh -l gpu -l tmpdir_size=10G -l h_rss=8G -l s_rt=15:00:00
[2018-05-26 11:04:41] fiedl@kepler18 /afs/ifh.de/group/amanda/scratch/fiedl/clsim
▶ git pull
[2018-05-26 11:07:10] fiedl@wgs16 /afs/ifh.de/group/amanda/scratch/fiedl/software/icecube-simulation-V05-00-07/debug_build
▶ $ICESIM/env-shell.sh
▶ make
[2018-05-26 11:05:34] fiedl@kepler18 /afs/ifh.de/group/amanda/scratch/fiedl/software/icecube-simulation-V05-00-07/debug_build/clsim
▶ $ICESIM/env-shell.sh
[2018-05-26 11:09:27] fiedl@kepler18 /afs/ifh.de/group/amanda/scratch/fiedl/hole-ice-study/scripts/FlasherSimulation
▶ rm tmp/*
▶ time ./run.rb --hole-ice=simulation --width=24 --brightness=24 --effective-scattering-length=100.0 --absorption-length=100.0 --hole-ice-radius-in-dom-radii=0.5
NOTICE (I3Tray): I3Tray finishing... (I3Tray.cxx:473 in void I3Tray::Execute(unsigned int))
INFO (I3CLSimStepToPhotonConverterOpenCL): kernel statistics: 546.852 nanoseconds/photon, 409291894 photons (util: 88%) (NVIDIA CUDA Tesla K80) (I3CLSimStepToPhotonConverterOpenCL.cxx:1212 in boost::posix_time::ptime I3CLSimStepToPhotonConverterOpenCL::DumpStatistics(const cl::Event &, const boost::posix_time::ptime &, uint64_t, bool, const std::string &, const std::string &, uint64_t))
INFO (I3CLSimStepToPhotonConverterOpenCL): Num photons to copy (buffer 0): 162068 (I3CLSimStepToPhotonConverterOpenCL.cxx:1092 in void I3CLSimStepToPhotonConverterOpenCL::OpenCLThread_impl_downloadPhotons(boost::this_thread::disable_interruption &, bool &, unsigned int, uint32_t))
INFO (I3CLSimStepToPhotonConverterOpenCL): kernel statistics: 563.272 nanoseconds/photon, 409291942 photons (util: 100%) (NVIDIA CUDA Tesla K80) (I3CLSimStepToPhotonConverterOpenCL.cxx:1212 in boost::posix_time::ptime I3CLSimStepToPhotonConverterOpenCL::DumpStatistics(const cl::Event &, const boost::posix_time::ptime &, uint64_t, bool, const std::string &, const std::string &, uint64_t))
INFO (I3CLSimStepToPhotonConverterOpenCL): Num photons to copy (buffer 0): 155938 (I3CLSimStepToPhotonConverterOpenCL.cxx:1092 in void I3CLSimStepToPhotonConverterOpenCL::OpenCLThread_impl_downloadPhotons(boost::this_thread::disable_interruption &, bool &, unsigned int, uint32_t))
INFO (I3CLSimStepToPhotonConverterOpenCL): kernel statistics: 571.064 nanoseconds/photon, 323089575 photons (util: 100%) (NVIDIA CUDA Tesla K80) (I3CLSimStepToPhotonConverterOpenCL.cxx:1212 in boost::posix_time::ptime I3CLSimStepToPhotonConverterOpenCL::DumpStatistics(const cl::Event &, const boost::posix_time::ptime &, uint64_t, bool, const std::string &, const std::string &, uint64_t))
INFO (I3CLSimStepToPhotonConverterOpenCL): Num photons to copy (buffer 0): 121236 (I3CLSimStepToPhotonConverterOpenCL.cxx:1092 in void I3CLSimStepToPhotonConverterOpenCL::OpenCLThread_impl_downloadPhotons(boost::this_thread::disable_interruption &, bool &, unsigned int, uint32_t))
INFO (I3CLSimModule): Flushing I3Tray.. (I3CLSimModule.cxx:1639 in virtual void I3CLSimModule::Finish())
0.32s user 0.29s system 0% cpu 12:18.44 total
From that output, the scaling factor is 2600.
I'm updating the above plots to take this scaling factor into account: https://github.com/fiedl/hole-ice-study/commit/a4a466b3b00612017cd2574c0e66e4441bb53b1a
[2018-05-26 11:34:25] fiedl@fiedl-mbp ~/hole-ice-study/scripts/lib master ⚡
▶ python plot_simulation_time_vs_number_of_photons.py
Note that the above gauging simulation has not the same hole-ice cylinder configuration as the simulations that are the data source of the plot. Thus, the simulation time per photon can only show the order of magnitude.
Nevertheless, it shows that the simulation time per photon is of the same magnitude as the time spent in the kernel per photon. This is plausible because the main purpose of the simulation is photon propagation, which is done in the kernel. This should be where the intensive work is done. :ok:
Furthermore, with the new output of the total number of photons, the time spent in the kernel can actually be compared to the overall simulation time, which will be done during profiling: https://github.com/fiedl/hole-ice-study/issues/69.
This issue has been resolved by memory optimizations (#70).
During preliminary flasher studies (https://github.com/fiedl/hole-ice-study/issues/59), several workers run into resource limits (https://github.com/fiedl/hole-ice-study/issues/62) including run time.
EDIT: This issue has been resolved by memory optimizations (#70).