apt-sim / AdePT

Accelerated demonstrator of electromagnetic Particle Transport
Apache License 2.0
25 stars 34 forks source link

Draft: Factor build/link/use of VecGeom/CUDA using libraries for simplicity #279

Closed drbenmorgan closed 4 months ago

drbenmorgan commented 6 months ago

Due to VecGeom's legacy CUDA design and the exposure of device functions in public APIs of libraries, ensuring correct singular device code links in final executables is, and continues to be, awkward. This largely means reproducing VecGeom's pattern of shared/static/device linked libraries downstream so that usage can be appropriately tracked and the correct linking done at each stage. A suitable CMake module that packages all of the needed functionality is available from the Celeritas project where it's also been tested across a range of use cases and is therefore now pretty robust (as it can be).

This PR imports this CMake module into AdePT (it's a completely standalone file) and ports the build, link and use of AdePT to use it. The primary changes are:

I've tested this locally on a Rocky 9/GCC 11/Cuda 12 system with an underlying VecGeom install with static libs and CUDA enabled. The build and tests all run fine and without problem. I've also been able to take one of the examples, recast it into a standalone project and build and run it against an install of AdePT. However, CI and wider use may show up additional bugs...

Pinging @pcanal and @sethrj for info and any additional comments they want to make.

phsft-bot commented 6 months ago

Can one of the admins verify this patch?

pcanal commented 6 months ago

For simplicity of future update to CudaRdcUtils.cmake, the git commit log should indicate which commit from either the Celeritas repository or the CudaRdc repository the file was copied from.

agheata commented 6 months ago

@drbenmorgan , both me an @JuanGonzalezCaminero are getting locally linking errors with this branch. I have the VecGeom master and cmake 3.28.3

[ 47%] Linking CXX executable ../BuildProducts/bin/test_atomic
/usr/bin/ld: ../BuildProducts/lib/libAdePT.so: undefined reference to `AdePTGeant4Integration::ReturnTracks(std::vector<adeptint::TrackData, std::allocator<adeptint::TrackData> >*, int)'
/usr/bin/ld: ../BuildProducts/lib/libAdePT.so: undefined reference to `AdePTGeant4Integration::ProcessGPUHits(HostScoring&, HostScoring::Stats&)'
/usr/bin/ld: ../BuildProducts/lib/libAdePT.so: undefined reference to `AdePTGeant4Integration::InitScoringData(adeptint::VolAuxData*)'
/usr/bin/ld: ../BuildProducts/lib/libAdePT.so: undefined reference to `AdePTGeant4Integration::GetUniformFieldZ()'
/usr/bin/ld: ../BuildProducts/lib/libAdePT.so: undefined reference to `AdePTGeant4Integration::InitVolAuxData(adeptint::VolAuxData*, G4HepEmState*, bool, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >*)'
/usr/bin/ld: ../BuildProducts/lib/libAdePT.so: undefined reference to `AdePTGeant4Integration::CheckGeometry(G4HepEmState*)'

However, I checked and the symbols are defined in libAdePT_G4_integration.so

0000000000071410 T AdePTGeant4Integration::ReturnTracks(std::vector<adeptint::TrackData, std::allocator<adeptint::TrackData> >*, int)

The error is the same if I change manually the link order of libAdePT.so and libAdePT_G4_integration.so Any idea?

agheata commented 6 months ago

also getting when linking example1:

make[1]: *** [CMakeFiles/Makefile2:1376: test/CMakeFiles/test_atomic.dir/all] Error 2
/usr/bin/ld: /home/agheata/geant_src/VecGeom_install/lib/libvecgeom.a(UnplacedCone.cpp.o): undefined reference to symbol '_ZNK7vecgeom3cxx9DevicePtrINS_4cuda13SUnplacedConeINS2_9ConeTypes13UniversalConeEEEE9ConstructIJdddddddEEEvDpT_'
/usr/bin/ld: /home/agheata/geant_src/VecGeom_install/lib/libvecgeomcuda.so: error adding symbols: DSO missing from command line
collect2: error: ld returned 1 exit status
make[2]: *** [examples/Example1/CMakeFiles/example1.dir/build.make:392: BuildProducts/bin/example1] Error 1
drbenmorgan commented 6 months ago

@agheata, I've been chatting to @JuanGonzalezCaminero on Mattermost about this - I'm in the process of rebuilding my local stack with the newer CUDA and CMake that he has to see if I can reproduce. What I am now spotting is there is a undeclared (to CMake) circular dependence between libAdePT and libAdePT_G4_integration....

drbenmorgan commented 6 months ago

@pcanal may have insight on the

/usr/bin/ld: /home/agheata/geant_src/VecGeom_install/lib/libvecgeomcuda.so: error adding symbols: DSO missing from command line

error. I believe was seen as well in Celeritas and patched?

drbenmorgan commented 6 months ago

One other thing to check: what is the full list of Vecgeom libraries in your installs (e.g. under /home/agheata/geant_src/VecGeom_install/lib)? I've tried locally with both static and shared builds and am simply not able to reproduce. Could you also let me know the versions of VecGeom and Geant4 (down to commit) you are building against, and the exact set of build options used to configure and install them please?

There's some weird difference or specific configuration setup that's causing these problems for you but not me or the CI system.

pcanal commented 6 months ago

@pcanal may have insight on the

/usr/bin/ld: /home/agheata/geant_src/VecGeom_install/lib/libvecgeomcuda.so: error adding symbols: DSO missing from command line

error. I believe was seen as well in Celeritas and patched?

Yes we have seem similar but slightly different. The fix was celeritas-project/celeritas@03b449a (#1102). A description of the problem can be found at: https://zhangboyi.gitlab.io/post/2020-09-14-resolve-dso-missing-from-command-line-error/

The bottom-line is that you might need to add explicitly more library to the link library list or .... something is messed up in the installation (eg. installation of both shared and static version of VecGeom in the same area or .... )

agheata commented 6 months ago

there is a undeclared (to CMake) circular dependence between libAdePT and libAdePT_G4_integration....

Indeed, we are looking now into how to break the dependency of lib AdePT on libAdePT_G4_integration

agheata commented 6 months ago

One other thing to check: what is the full list of Vecgeom libraries in your installs (e.g. under /home/agheata/geant_src/VecGeom_install/lib)?

libvecgeom.a libvecgeomcuda.so libvecgeomcuda_static.a libvgdml.a

Could you also let me know the versions of VecGeom and Geant4 (down to commit) you are building against, and the exact set of build options used to configure and install them please? * 89ec50d4 (HEAD -> master, origin/master, origin/HEAD) VECGEOM-618 : Added new definition of SafetyToOut to fix SafetyToOut issue as mentioned in JIRA issue-618 [Raman Sehgal]

cmake  $GEANT_ROOT/VecGeom \
         -DCMAKE_INSTALL_PREFIX="$INSTALL_FOLDER" \
         -DCMAKE_PREFIX_PATH="$VECCORE_ROOT" \
         -DVECGEOM_GDML=ON \
         -DVECGEOM_ROOT=OFF \
         -DVECGEOM_GEANT4=OFF \
         -DVECGEOM_ENABLE_CUDA=ON \
         -DVECGEOM_BACKEND=Scalar \
         -DVECGEOM_NO_SPECIALIZATION=ON \
         -DVECGEOM_VECTOR=avx2 \
         -DVECGEOM_TEST_BENCHMARK=ON \
         -DVECGEOM_TEST_COVERAGE=OFF \
         -DCMAKE_BUILD_TYPE=$BUILDMODE \
         -DCMAKE_CUDA_ARCHITECTURES=75 \
         -DUSE_CACHED_TRANSFORMATIONS=ON \
         -DCMAKE_CXX_STANDARD=17 \
         -DCMAKE_CUDA_STANDARD=17 \
         -DEXAMPLES=ON

and for AdePT: Geant4 v11.2.0

cmake -S. -B./adept-build/${VGMODE} \
      -DCMAKE_INSTALL_PREFIX="$INSTALL_FOLDER" \
      -DCMAKE_BUILD_TYPE=$BUILDMODE \
      -DCMAKE_PREFIX_PATH="$VECCORE_ROOT;$VECGEOMROOT/lib/cmake/VecGeom;$HEPMC3_ROOT;$G4INSTALL;$G4HepEm_INSTALL" \
      -DCMAKE_CUDA_ARCHITECTURES=75 \
      -DCMAKE_EXPORT_COMPILE_COMMANDS=ON \
      -DADEPT_USE_SURF=OFF \
      -DBENCHMARK=ON \
      -DBUILD_TESTING=ON
drbenmorgan commented 6 months ago

Thanks for the info @agheata, I think the crucial thing is that we have the same VecGeom library structure here, i.e. only a static libvecgeom. Let's take one step back to the error you reported with the build of test_atomic:

> [ 47%] Linking CXX executable ../BuildProducts/bin/test_atomic
> /usr/bin/ld: ../BuildProducts/lib/libAdePT.so: undefined reference to `AdePTGeant4Integration::ReturnTracks(std::vector<adeptint::TrackData, std::allocator<adeptint::TrackData> >*, int)'
> /usr/bin/ld: ../BuildProducts/lib/libAdePT.so: undefined reference to `AdePTGeant4Integration::ProcessGPUHits(HostScoring&, HostScoring::Stats&)'
> /usr/bin/ld: ../BuildProducts/lib/libAdePT.so: undefined reference to `AdePTGeant4Integration::InitScoringData(adeptint::VolAuxData*)'
> /usr/bin/ld: ../BuildProducts/lib/libAdePT.so: undefined reference to `AdePTGeant4Integration::GetUniformFieldZ()'
> /usr/bin/ld: ../BuildProducts/lib/libAdePT.so: undefined reference to `AdePTGeant4Integration::InitVolAuxData(adeptint::VolAuxData*, G4HepEmState*, bool, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >*)'
> /usr/bin/ld: ../BuildProducts/lib/libAdePT.so: undefined reference to `AdePTGeant4Integration::CheckGeometry(G4HepEmState*)'

Could you run:

$ make clean && make VERBOSE=1 test_atomic

and post the compile/link commands for the source file and end executable (should be the last two) please? For comparison, with my local, working build, I get (paths reduced for clarity):

[13/14] /.../nvcc -forward-unknown-to-host-compiler -O2 -g -DNDEBUG "--generate-code=arch=compute_86,code=[compute_86,sm_86]" /.../BuildProducts/lib64/libAdePT_static.a /.../lib64/libvecgeomcuda_static.a -Xnvlink --suppress-stack-size-warning -Xcompiler=-fPIC -Wno-deprecated-gpu-targets -shared -dlink test/CMakeFiles/test_atomic.dir/test_atomic.cu.o -o CMakeFiles/test_atomic.dir/cmake_device_link.o -L/.../lib/stubs  -L/...cuda.../lib /.../libvgdml.a /.../libvecgeom.a -lrt -lpthread -ldl  -lcudadevrt -lcudart

[14/14] : && /usr/bin/c++ -O2 -g -DNDEBUG  test/CMakeFiles/test_atomic.dir/test_atomic.cu.o CMakeFiles/test_atomic.dir/cmake_device_link.o -o BuildProducts/bin/test_atomic -L/...cuda.../stubs   -L/...cuda.../lib -Wl,-rpath,/.../  BuildProducts/lib64/libAdePT_G4_integration.so  BuildProducts/lib64/libAdePT.so  /.../libvecgeomcuda.so  /.../libvgdml.a  /.../libvecgeom.a  -lrt  -lpthread  -ldl  /...geant4, g4hepemlibs.../  -lcudadevrt  -lcudart
JuanGonzalezCaminero commented 6 months ago

This is what I get for test_atomic @drbenmorgan

[100%] Linking CUDA device code CMakeFiles/test_atomic.dir/cmake_device_link.o
cd /home/gjuan/Dev/G4/AdePT_Master/build/test && /usr/bin/cmake -E cmake_link_script CMakeFiles/test_atomic.dir/dlink.txt --verbose=1
/usr/local/cuda-12.3/bin/nvcc -forward-unknown-to-host-compiler -O3 -DNDEBUG "--generate-code=arch=compute_89,code=[compute_89,sm_89]" /.../libAdePT_static.a /.../libvecgeomcuda_static.a -Xnvlink --suppress-stack-size-warning -Xcompiler=-fPIC -Wno-deprecated-gpu-targets -shared -dlink --options-file CMakeFiles/test_atomic.dir/deviceObjects1.rsp -o CMakeFiles/test_atomic.dir/cmake_device_link.o --options-file CMakeFiles/test_atomic.dir/deviceLinkLibs.rsp

[100%] Linking CXX executable ../BuildProducts/bin/test_atomic
cd /home/gjuan/Dev/G4/AdePT_Master/build/test && /usr/bin/cmake -E cmake_link_script CMakeFiles/test_atomic.dir/link.txt --verbose=1
/usr/bin/c++ -O3 -DNDEBUG CMakeFiles/test_atomic.dir/test_atomic.cu.o CMakeFiles/test_atomic.dir/cmake_device_link.o -o ../BuildProducts/bin/test_atomic   -L/usr/local/cuda-12.3/targets/x86_64-linux/lib/stubs  -L/usr/local/cuda-12.3/targets/x86_64-linux/lib  -L/usr/lib/gcc/x86_64-linux-gnu/12  -Wl,-rpath,/.../ ../BuildProducts/lib/libAdePT_G4_integration.so ../BuildProducts/lib/libAdePT.so /.../libvecgeomcuda.so /.../libvgdml.a /.../libvecgeom.a -lrt -lpthread -ldl /.../libG4Tree.so /.../libG4FR.so /.../libG4GMocren.so /.../libG4visHepRep.so /.../libG4RayTracer.so /.../libG4VRML.so /.../libG4ToolsSG.so /.../libG4vis_management.so /.../libG4modeling.so /.../libG4interfaces.so /.../libG4persistency.so /usr/lib/x86_64-linux-gnu/libxerces-c.so /.../libG4error_propagation.so /.../libG4readout.so /.../libG4physicslists.so /.../libG4run.so /.../libG4parmodels.so /.../libg4HepEm.so /.../libG4event.so /.../libG4tracking.so /.../libg4HepEmInit.so /.../libG4processes.so /.../libG4analysis.so /.../libG4digits_hits.so /.../libG4track.so /.../libG4geometry.so /.../libG4graphics_reps.so /.../libG4expat.so /.../libG4particles.so /.../libG4materials.so /.../libG4intercoms.so /.../libG4global.so /.../libG4ptl.so.2.3.3 /.../libG4zlib.so /.../libg4HepEmRun.so /.../libG4clhep.so /.../libg4HepEmData.so -lcudadevrt -lcudart
drbenmorgan commented 6 months ago

Thanks @JuanGonzalezCaminero, is that failing with the same error as @agheata reported earlier? The only difference I can spot is the use of response files for the device link step. Could you check/post the contents of the files:

test/CMakeFiles/test_atomic.dir/deviceObjects1.rsp
test/CMakeFiles/test_atomic.dir/deviceLinkLibs.rsp

please?

edit: clarified the path to these files from the build directory.

JuanGonzalezCaminero commented 6 months ago

Yes, we were having the same errors, the files have:

deviceObjects1.rsp:

CMakeFiles/test_atomic.dir/test_atomic.cu.o

deviceLinkLibs.rsp

-L"/usr/local/cuda-12.3/targets/x86_64-linux/lib/stubs"  -L"/usr/local/cuda-12.3/targets/x86_64-linux/lib"  -L"/usr/lib/gcc/x86_64-linux-gnu/12"  /home/gjuan/Dev/G4/vecgeom_installation/lib/libvgdml.a /home/gjuan/Dev/G4/vecgeom_installation/lib/libvecgeom.a -lrt -lpthread -ldl  -lcudadevrt -lcudart
JuanGonzalezCaminero commented 6 months ago

Indeed I'm able to build after the last commit @drbenmorgan

drbenmorgan commented 6 months ago

O.k., I have absolutely no idea why things are failing on these specific machines, so the latest commit is a shot in the dark. All it does is add a dummy .cu file to AdePT_G4_integration. This forces it to be treated as an RDC library and should then appear on the device link line for test_atomic. This might resolve the missing symbol errors on the CERN machines, but if not then....

drbenmorgan commented 6 months ago

Thanks for the update @JuanGonzalezCaminero, that's something of a relief! If you could also check that your separate "findpackage_test" example building against an install also works, that'd cover that base as well.

All I can really imagine is that this is something to do with GCC - could you and @agheata double check and confirm which Linux, gcc and associated toolchain is in use please (i.e. gcc, gcc-ar gcc-ranlib etc)?

JuanGonzalezCaminero commented 6 months ago

Ah, I wasn't building the examples, I confirm the commit solved the linking issue with AdePT_G4_Integration, however the vecgeom DSO missing from command line error still appears, so I can't test the external linking yet.

drbenmorgan commented 6 months ago

Right, same exercise as before, so could you post the full link line for example1 please? It'll be the last one if you run

$ make clean && make VERBOSE=1 example1
JuanGonzalezCaminero commented 6 months ago

This is what I get:

cd /home/gjuan/Dev/G4/AdePT_Master/build/examples/Example1 && /usr/bin/cmake -E cmake_link_script CMakeFiles/example1.dir/link.txt --verbose=1
/usr/bin/c++ -O3 -DNDEBUG CMakeFiles/example1.dir/example1.cpp.o CMakeFiles/example1.dir/__/common/src/ParticleGun.cc.o CMakeFiles/example1.dir/__/common/src/ParticleGunMessenger.cc.o CMakeFiles/example1.dir/__/common/src/FTFP_BERT_HepEm.cc.o CMakeFiles/example1.dir/__/common/src/FTFP_BERT_AdePT.cc.o CMakeFiles/example1.dir/src/ActionInitialisation.cc.o CMakeFiles/example1.dir/src/DetectorConstruction.cc.o CMakeFiles/example1.dir/src/DetectorMessenger.cc.o CMakeFiles/example1.dir/src/EventAction.cc.o CMakeFiles/example1.dir/src/EventActionMessenger.cc.o CMakeFiles/example1.dir/src/SteppingAction.cc.o CMakeFiles/example1.dir/src/SimpleHit.cc.o CMakeFiles/example1.dir/src/PrimaryGeneratorAction.cc.o CMakeFiles/example1.dir/src/RunAction.cc.o CMakeFiles/example1.dir/src/SensitiveDetector.cc.o CMakeFiles/example1.dir/src/TrackingAction.cc.o CMakeFiles/example1.dir/__/common/src/HepMC3G4AsciiReader.cc.o CMakeFiles/example1.dir/__/common/src/HepMC3G4AsciiReaderMessenger.cc.o CMakeFiles/example1.dir/__/common/src/HepMC3G4Interface.cc.o -o ../../BuildProducts/bin/example1  -Wl,-rpath,/.../ /.../HepMC3libs ../../BuildProducts/lib/libAdePT_G4_integration_final.so ../../BuildProducts/lib/libAdePT_G4_integration.so ../../BuildProducts/lib/libAdePT.so /.../libvecgeomcuda.so /.../libvgdml.a /.../libvecgeom.a -lrt -lpthread -ldl /.../Geant4libs /.../G4HepEMlibs
drbenmorgan commented 6 months ago

Which looks identical to mine... One last thing to try - could you modify the link command in examples/Example1/CMakeLists.txt to:

cuda_rdc_target_link_libraries(example1
  PRIVATE
    AdePT_G4_integration
    AdePT
    VecGeom::vecgeom
    ${HEPMC3_LIBRARIES} 
    ${HEPMC3_FIO_LIBRARIES}
)

Note the extra VecGeom link. This should fully declare that as needed in the link rather than relying on transitive inclusion. If that still fails, could you also:

  1. Wipe that build directory and start from scratch (I have seen occasional caching issues with RDC builds)
  2. If it fails again, post the link line now used.
drbenmorgan commented 6 months ago

As noted in #280, we should wait on the fix to the circular dependency presented there to go through with the existing link structure. I can then rebase here so we can at least decouple these issues (if they are not already orthogonal).

As I commented in #280:

FWIW, I can build the changes here without issue on my local system (Alma 9) with either GCC 11.4 or 12.2 and with CUDA > 12.0 or 12.3, plus a standard VecGeom CUDA enabled install.

and also the CI machine is passing. @agheata, @JuanGonzalezCaminero, which machines are you using, and are they accessible for CERN account holders at all? Are you able to reproduce the errors on lxplus-gpu? We can discuss further on Mattermost as required, but there's something very specific to your setups that's exposing this problem.

One additional cross-check to try (if @pcanal and @sethrj don't mind):

  1. Could @agheata and @JuanGonzalezCaminero try building the current Celeritas develop branch on the systems in question using the Geant4/VecGeom/etc setup as for AdePT
  2. Could @pcanal and @sethrj try building this PR in their Celeritas dev environment(s)

This should expose any fundamental system problems.

JuanGonzalezCaminero commented 6 months ago

@drbenmorgan Building the current Celeritas develop branch with the following configuration:

CXX=/usr/bin/gcc-10 cmake -S. -B./build 
    -DCMAKE_INSTALL_PREFIX="../celeritas_installation" 
    -DCMAKE_PREFIX_PATH="../geant4_installation;../vecgeom_installation" 
    -DCMAKE_CUDA_ARCHITECTURES=89

I get the following when it tries to link orange-update

[ 65%] Linking CXX executable ../bin/orange-update
/usr/bin/ld: CMakeFiles/orange-update.dir/orange-update.cc.o: undefined reference to symbol '_ZNKSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE7compareEPKc@@GLIBCXX_3.4.21'
/usr/bin/ld: /lib/x86_64-linux-gnu/libstdc++.so.6: error adding symbols: DSO missing from command line

But in this case it seems unrelated. It seems I can't compile with gcc 12.3, the compiler I used instead is gcc 10.5

sethrj commented 6 months ago

@JuanGonzalezCaminero that last error message looks like you're mixing incompatible compilers...

JuanGonzalezCaminero commented 6 months ago

It may well be, yes. What combination of gcc - nvcc are you normally using? @sethrj

sethrj commented 6 months ago

On our development machine we use GCC 8 and CUDA 11.5. But the problem you seem to be having is that you've likely compiled VecGeom/Geant4 with one version of GCC and are building Celeritas with a different one.

sethrj commented 6 months ago
  1. Could @pcanal and @sethrj try building this PR in their Celeritas dev environment(s)

This should expose any fundamental system problems.

CMake Error at CMakeLists.txt:76 (message):
AdePT requires at least VecGeom 2.0.0-dev.3, found 1.2.4

@drbenmorgan I don't even see a 2.0.0-dev.3 tag for vecgeom...

sethrj commented 6 months ago
CMake Warning at CMakeLists.txt:133 (find_package):
  By not providing "FindG4HepEm.cmake" in CMAKE_MODULE_PATH this project has
  asked CMake to find a package configuration file provided by "G4HepEm", but
  CMake did not find one.

  Could not find a package configuration file provided by "G4HepEm" with any
  of the following names:

    G4HepEmConfig.cmake
    g4hepem-config.cmake

  Add the installation prefix of "G4HepEm" to CMAKE_PREFIX_PATH or set
  "G4HepEm_DIR" to a directory containing one of the above files.  If
  "G4HepEm" provides a separate development package or SDK, be sure it has
  been installed.

CMake Error at cmake/CudaRdcUtils.cmake:617 (set_target_properties):
  Target "AdePT_objects" links to:

    G4HepEm::g4HepEm

  but the target was not found.  Possible reasons include:

    * There is a typo in the target name.
    * A find_package call is missing for an IMPORTED target.
    * An ALIAS target is missing.

Call Stack (most recent call first):
  cmake/CudaRdcUtils.cmake:746 (cuda_rdc_use_middle_lib_in_property)
  CMakeLists.txt:180 (cuda_rdc_target_link_libraries)

I guess it requires g4hepem as an external install now rather than a builtin?

sethrj commented 6 months ago

You should add a check to the main CMakeLists that G4HepEm is built with GPU enabled; otherwise there are a bunch of build errors.

drbenmorgan commented 6 months ago

@sethrj, ah *&^% I'd forgotten about these so don't worry about doing the build for now. I have a patch to spack for the vecgeom commit and file for g4hepem if you wish but it's not critical. Let me just triple-check that I can build celeritas develop locally.

sethrj commented 6 months ago

@drbenmorgan On my third try I got it working πŸ˜„ master commit of g4hepem and vecgeom, on your branch, built with -DCMAKE_CUDA_ARCHITECTURES=70 -DCMAKE_BUILD_TYPE=RelWithDebInfo and no errors and everything seems to run fine.

drbenmorgan commented 6 months ago

Thanks @sethrj, much appreciated πŸ™‡! That was with GCC8, CUDA 11.5 (which OS?)?

sethrj commented 6 months ago

linux-rhel8-cascadelake, gcc 8.5.0, CUDA 11.5.

drbenmorgan commented 6 months ago

O.k., on my setup (linux-rocky9-cascadelake, gcc 11.4.1, CUDA 12) the same spack env builds both this PR and celeritas develop perfectly (so even Celeritas with Vecgeom 2.0.0-rc2, which is this tag/tarball

JuanGonzalezCaminero commented 6 months ago

@drbenmorgan We just merged the removal of circular dependencies. Could this PR be rebased onto master?

drbenmorgan commented 6 months ago

Latest commits are a very quick and dirty rebase on the latest master, with fixups for the change to header only etc. It works locally for me modulo some device code warnings on flags and that the testField test doesn't pass. However, I want to see what the CI shows first.

drbenmorgan commented 6 months ago

O.k., CI looks good, so @JuanGonzalezCaminero, @agheata could you try building again and see if things are now resolved on your side please?

JuanGonzalezCaminero commented 6 months ago

Still getting one error here:

[100%] Linking CXX executable ../../BuildProducts/bin/example1
/usr/bin/ld: ../../BuildProducts/lib/libAdePT_G4_integration.so: undefined reference to `__cudaRegisterLinkedBinary_5ad212d0_23_AdePTTrackingManager_cu_6f198e90_884112'

Simplified link line:

/usr/bin/c++ -O3 -DNDEBUG CMakeFiles/example1.dir/example1.cpp.o CMakeFiles/example1.dir/__/common/src/ParticleGun.cc.o CMakeFiles/example1.dir/__/common/src/ParticleGunMessenger.cc.o CMakeFiles/example1.dir/__/common/src/FTFP_BERT_HepEm.cc.o CMakeFiles/example1.dir/__/common/src/FTFP_BERT_AdePT.cc.o CMakeFiles/example1.dir/src/ActionInitialisation.cc.o CMakeFiles/example1.dir/src/DetectorConstruction.cc.o CMakeFiles/example1.dir/src/DetectorMessenger.cc.o CMakeFiles/example1.dir/src/EventAction.cc.o CMakeFiles/example1.dir/src/EventActionMessenger.cc.o CMakeFiles/example1.dir/src/SteppingAction.cc.o CMakeFiles/example1.dir/src/SimpleHit.cc.o CMakeFiles/example1.dir/src/PrimaryGeneratorAction.cc.o CMakeFiles/example1.dir/src/RunAction.cc.o CMakeFiles/example1.dir/src/SensitiveDetector.cc.o CMakeFiles/example1.dir/src/TrackingAction.cc.o CMakeFiles/example1.dir/__/common/src/HepMC3G4AsciiReader.cc.o CMakeFiles/example1.dir/__/common/src/HepMC3G4AsciiReaderMessenger.cc.o CMakeFiles/example1.dir/__/common/src/HepMC3G4Interface.cc.o -o ../../BuildProducts/bin/example1  -Wl,-rpath,/.../ HepMC3libs ../../BuildProducts/lib/libAdePT_G4_integration_final.so ../../BuildProducts/lib/libAdePT_G4_integration.so VecGeomlibs -lrt -lpthread -ldl XercesC G4libs G4HepEMlibs

And the same line on Master:

/usr/bin/c++ -O3 -DNDEBUG CMakeFiles/example1.dir/example1.cpp.o CMakeFiles/example1.dir/__/common/src/ParticleGun.cc.o CMakeFiles/example1.dir/__/common/src/ParticleGunMessenger.cc.o CMakeFiles/example1.dir/__/common/src/FTFP_BERT_HepEm.cc.o CMakeFiles/example1.dir/__/common/src/FTFP_BERT_AdePT.cc.o CMakeFiles/example1.dir/src/ActionInitialisation.cc.o CMakeFiles/example1.dir/src/DetectorConstruction.cc.o CMakeFiles/example1.dir/src/DetectorMessenger.cc.o CMakeFiles/example1.dir/src/EventAction.cc.o CMakeFiles/example1.dir/src/EventActionMessenger.cc.o CMakeFiles/example1.dir/src/SteppingAction.cc.o CMakeFiles/example1.dir/src/SimpleHit.cc.o CMakeFiles/example1.dir/src/PrimaryGeneratorAction.cc.o CMakeFiles/example1.dir/src/RunAction.cc.o CMakeFiles/example1.dir/src/SensitiveDetector.cc.o CMakeFiles/example1.dir/src/TrackingAction.cc.o CMakeFiles/example1.dir/__/common/src/HepMC3G4AsciiReader.cc.o CMakeFiles/example1.dir/__/common/src/HepMC3G4AsciiReaderMessenger.cc.o CMakeFiles/example1.dir/__/common/src/HepMC3G4Interface.cc.o -o ../../BuildProducts/bin/example1   -L/usr/local/cuda-12.3/targets/x86_64-linux/lib/stubs  -L/usr/local/cuda-12.3/targets/x86_64-linux/lib  -L/usr/lib/gcc/x86_64-linux-gnu/12  -Wl,-rpath,/.../ ../../BuildProducts/lib/libAdePT_G4_integration.so HepMC3libs VecGeomlibs -lrt -lpthread -ldl XercesC G4libs G4HepEMlibs -lcudadevrt -lcudart_static -lrt -lpthread -ldl 
drbenmorgan commented 6 months ago

Argh... Could you try adding this line:

target_link_options(example1 PRIVATE "-Wl,--no-as-needed")

at line 64 of examples/example1/CMakeLists.txt please? Are you running on an Ubuntu machine by any chance?

JuanGonzalezCaminero commented 6 months ago

That line solves it indeed. I'm running on Debian

sethrj commented 6 months ago

Yep, I found out quite some time ago that VecGeom requires this on Ubuntu systems: see https://wiki.gentoo.org/wiki/Project:Quality_Assurance/-Wl,-z,defs_and_-Wl,--no-allow-shlib-undefined .

drbenmorgan commented 6 months ago

Thanks both! @JuanGonzalezCaminero, could you try modifying the line to:

target_link_options(example1 PRIVATE "-Wl,-z,defs")

and see what it prints please? Not a fix, but useful to see what it produces. I'll have a think about how to fix this for installs, but in the meantime I think we can use a project wide no-as-needed for our own builds.

JuanGonzalezCaminero commented 6 months ago

As far as I can tell I get the same error and link line when using those options. Is there something specific I should look for?

drbenmorgan commented 6 months ago

If I understand the option correctly, it should report specifics on what's missing, though perhaps it needs to be applied to the libraries itself? Anyway, I'm now able to reproduce the error locally by adding the --as-needed flag to my builds so can at leats check fixes locally (which I'll do).

agheata commented 6 months ago

@drbenmorgan it also works for me when adding target_link_options(integrationBenchmark PRIVATE "-Wl,--no-as-needed"), do you plan to add it globally for now in this PR? Or should we use -Wl,-z,defs during the creation of the libAdePT_G4_integration libs to force the resolution of all dependencies before the examples are built?

drbenmorgan commented 6 months ago

IIUC, -Wl,-z,defs is only for library developers to warn, rather than fix things (but I need to explore this more). The idea for now would be to use --no-as-needed globally in AdePT, and then document that this may be needed by clients to resolve things.

As @WitekPokorski reported that the current AdePT libraries had been successfully linked in Gaussino, I'm first going to go back to the celer-adept joint integration layer and see what happens there when fully linking in and using AdePT before deciding how to progress this PR (I also want to go back and update g4vg and its use as I think that's a higher priority?).

drbenmorgan commented 4 months ago

Closing as superseded by #289