Add ability to select backend at run time

ariostas commented 1 year ago

Added the ability to select a backend during run time in a way that works both for the standalone version and for the CMSSW setup. This was done by compiling two separate libraries and selecting one at run time. Here are some points describing the reasoning for the changes.

For CMSSW both libraries need to have different names. We went with sdl_gpu and sdl_cpu, i.e. the files are libsdl_gpu.so and libsdl_cpu.so. @YonsiG and @VourMa will add additional changes to this PR and to the CMSSW repo to make it work.
For the standalone version it is much easier to pick between the two libraries when they have the same name. This is why symlinks are created in SDL/gpu/libsdl.so and SDL/cpu/libsdl.so. The library is now dynamically linked at runtime, allowing us to override the which one it links to.
By default, sdl_make_tracklooper will now compile both libraries, but to save some time it is possible to exclusively compile the GPU or CPU library with the -G and -C flags. When both libraries are compiled, the GPU one is used as the default one since its path appears first in LD_LIBRARY_PATH.
sdl_run now has the option of only running the code and not recompiling it, simply by omitting the -f flag. A backend can be specified with the -b flag, e.g. -b cpu. This is the easiest way to pick a backend, but when running the sdl binary directly ~~it can be done with LD_PRELOAD=${TRACKLOOPERDIR}/SDL/cpu/libsdl.so sdl <args>.~~ EDIT: To run the sdl binary directly you need to use LD_LIBRARY_PATH=${TRACKLOOPERDIR}/SDL/cpu/:$LD_LIBRARY_PATH sdl <args> (see comment below).

I'll do some more testing to make sure that everything looks good, and other people should also test it and review the changes closely because there were significant modifications to scripts and makefiles. I can update the readme when it's all settled.

YonsiG commented 1 year ago

Hi Andres, I am testing the cpu version of sdl_run but meet with some issues. I did "sdl_run -s PU200 -n 1 -t testcpu -b cpu", but It says "/home/users/yagu/TrackLooper_Ntuple/LST_in_cmssw/TrackLooper/bin/sdl_run: line 159: 93187 Aborted (core dumped) LD_PRELOAD=/home/users/yagu/TrackLooper_Ntuple/LST_in_cmssw/TrackLooper/SDL/cpu/libsdl.so sdl -i PU200 -o ./testcpu_PU200_NEVT1LSTNtuple.root -n 1 >> ./testcpu_PU200_NEVT1LSTRun.log 2>&1 ERROR: sdl command failed!" in the log file, it says at the end of the program "double free or corruption (!prev)" Can you have a look at this? Thanks!

ariostas commented 1 year ago

Thank you @YonsiG. I think at some point during testing I forgot to re-run the setup.sh script. What was happening was that with LD_PRELOAD it was preloading the library, but it was still loading the one it found from LD_LIBRARY_PATH. So it was trying to run two things at once. I changed it so that now instead of using LD_PRELOAD it temporarily updates LD_LIBRARY_PATH to point it to the correct library. So the correct way of running the sdl binary directly should be LD_LIBRARY_PATH=${TRACKLOOPERDIR}/SDL/cpu/:$LD_LIBRARY_PATH sdl <args>.

YonsiG commented 1 year ago

Tried to run the sdl commands by using LD_LIBRARY_PATH=${TRACKLOOPERDIR}/SDL/cpu/:$LD_LIBRARY_PATH ./bin/sdl -i PU200 -v 0 -w 2 -n 1 There is one printout saying "ana.do_run_cpu: 0". Seems like it's not very correct/consistent

ariostas commented 1 year ago

I changed sdl.cc so that at runtime it checks which library is loaded. This way, it can correctly print whether it's running on cpu.

YonsiG commented 1 year ago

Thanks Andres! This PR gets validated together with PR#15 in SegmentLinking/cmssw , mtv plots are here for out of the box. http://uaf-10.t2.ucsd.edu/~yagu/SDL_GPU_plots/CMSSW_mtv/compileCPU_GPU/plots_final/plots_ootb/effandfakePtEtaPhi.pdf

GNiendorf commented 1 year ago

Also, how does this change work with the timing script and switching between backends? Does that need to be updated in any way?

edit: Ignore this, looks like it works fine with this PR.

ariostas commented 1 year ago

Hi @GNiendorf, the timing script should work fine with this PR. However, this was merged a bit prematurely and we need to add some documentation to the readme to explain the new changes. I'll work on it as soon as possible.

YonsiG commented 1 year ago

Additional comparison with the original master branch: CPU vs master CPU(same): http://uaf-10.t2.ucsd.edu/~yagu/SDL_GPU_plots/CMSSW_mtv/compileCPU_GPU/plots_master_compare/plots_cpu_compare/plots_ootb/effandfakePtEtaPhi.pdf GPU vs master GPU(a bit fluctuations): http://uaf-10.t2.ucsd.edu/~yagu/SDL_GPU_plots/CMSSW_mtv/compileCPU_GPU/plots_master_compare/plots_gpu_compare/plots_ootb/effandfakePtEtaPhi.pdf

SegmentLinking / TrackLooper

Add ability to select backend at run time #341