Open krasznaa opened 4 months ago
In a dumb way I can make clusterization succeed with:
diff --git a/device/common/include/traccc/clusterization/device/ccl_kernel.hpp b/device/common/include/traccc/clusterization/device/ccl_kernel.hpp
index a61ba7a4..473dc36d 100644
--- a/device/common/include/traccc/clusterization/device/ccl_kernel.hpp
+++ b/device/common/include/traccc/clusterization/device/ccl_kernel.hpp
@@ -29,7 +29,7 @@ namespace details {
using index_t = unsigned short;
static constexpr int TARGET_CELLS_PER_THREAD = 8;
-static constexpr int MAX_CELLS_PER_THREAD = 12;
+static constexpr int MAX_CELLS_PER_THREAD = 16;
/// Helper struct for calculating some of the input parameters of @c ccl_kernel
struct ccl_kernel_helper {
@@ -46,7 +46,7 @@ struct ccl_kernel_helper {
max_cells_per_partition =
(target_cells_per_partition * MAX_CELLS_PER_THREAD +
TARGET_CELLS_PER_THREAD - 1) /
- TARGET_CELLS_PER_THREAD;
+ TARGET_CELLS_PER_THREAD * 2;
threads_per_partition =
(target_cells_per_partition + TARGET_CELLS_PER_THREAD - 1) /
TARGET_CELLS_PER_THREAD;
Though I'm really doing this in a very naive / dumb way at the moment. Do we expect that really busy pixel modules would result in too many cells getting assigned to threads / threadblocks with the current logic of the algorithm? :thinking:
Unfortunately once this is solved, the debug build then fails on:
Detector check: OK
WARNING: @traccc::io::csv::read_cells: 9843 duplicate cells found in /data/ssd-1tb/projects/traccc/traccc/data/odd/geant4_ttbar_mu100/event000000000-cells.csv
Assertion failed.
Thread 1 "traccc_seq_exam" received signal CUDA_EXCEPTION_12, Warp Assert.
[Switching focus to CUDA kernel 5, grid 45, block (57,0,0), thread (25,0,0), device 0, sm 47, warp 1, lane 25]
0x00007fff90f695a0 in __assert_fail ()
(cuda-gdb) bt
#0 0x00007fff90f695a0 in __assert_fail ()
#1 0x00007ffeebd33a00 in bool detray::rk_stepper<covfie::field_view<covfie::backend::constant<covfie::vector::vector_d<float, 3ul>, covfie::vector::vector_d<float, 3ul> > >, detray::cmath<float>, detray::constrained_step<detray::darray>, detray::stepper_rk_policy, detray::stepping::void_inspector, detray::darray>::step<detray::propagator<detray::rk_stepper<covfie::field_view<covfie::backend::constant<covfie::vector::vector_d<float, 3ul>, covfie::vector::vector_d<float, 3ul> > >, detray::cmath<float>, detray::constrained_step<detray::darray>, detray::stepper_rk_policy, detray::stepping::void_inspector, detray::darray>, detray::navigator<detray::detector<detray::default_metadata, detray::container_types<vecmem::device_vector, detray::tuple, detray::darray, vecmem::jagged_device_vector, detray::dmap> > const, detray::navigation::void_inspector, detray::intersection2D<detray::surface_descriptor<detray::detail::typed_index<detray::default_metadata::mask_ids, unsigned int, unsigned int, 4026531840u, 268435455u>, detray::detail::typed_index<detray::default_metadata::material_ids, unsigned int, unsigned int, 4026531840u, 268435455u>, unsigned int, unsigned short>, detray::cmath<float> > >, detray::actor_chain<std::tuple, detray::pathlimit_aborter, detray::parameter_transporter<detray::cmath<float> >, traccc::interaction_register<detray::pointwise_material_interactor<detray::cmath<float> > >, detray::pointwise_material_interactor<detray::cmath<float> >, traccc::ckf_aborter> >::state>(detray::propagator<detray::rk_stepper<covfie::field_view<covfie::backend::constant<covfie::vector::vector_d<float, 3ul>, covfie::vector::vector_d<float, 3ul> > >, detray::cmath<float>, detray::constrained_step<detray::darray>, detray::stepper_rk_policy, detray::stepping::void_inspector, detray::darray>, detray::navigator<detray::detector<detray::default_metadata, detray::container_types<vecmem::device_vector, detray::tuple, detray::darray, vecmem::jagged_device_vector, detray::dmap> > const, detray::navigation::void_inspector, detray::intersection2D<detray::surface_descriptor<detray::detail::typed_index<detray::default_metadata::mask_ids, unsigned int, unsigned int, 4026531840u, 268435455u>, detray::detail::typed_index<detray::default_metadata::material_ids, unsigned int, unsigned int, 4026531840u, 268435455u>, unsigned int, unsigned short>, detray::cmath<float> > >, detray::actor_chain<std::tuple, detray::pathlimit_aborter, detray::parameter_transporter<detray::cmath<float> >, traccc::interaction_register<detray::pointwise_material_interactor<detray::cmath<float> > >, detray::pointwise_material_interactor<detray::cmath<float> >, traccc::ckf_aborter> >::state&, detray::stepping::config<float> const&) ()
#2 0x00007fff2f7d9020 in bool detray::propagator<detray::rk_stepper<covfie::field_view<covfie::backend::constant<covfie::vector::vector_d<float, 3ul>, covfie::vector::vector_d<float, 3ul> > >, detray::cmath<float>, detray::constrained_step<detray::darray>, detray::stepper_rk_policy, detray::stepping::void_inspector, detray::darray>, detray::navigator<detray::detector<detray::default_metadata, detray::container_types<vecmem::device_vector, detray::tuple, detray::darray, vecmem::jagged_device_vector, detray::dmap> > const, detray::navigation::void_inspector, detray::intersection2D<detray::surface_descriptor<detray::detail::typed_index<detray::default_metadata::mask_ids, unsigned int, unsigned int, 4026531840u, 268435455u>, detray::detail::typed_index<detray::default_metadata::material_ids, unsigned int, unsigned int, 4026531840u, 268435455u>, unsigned int, unsigned short>, detray::cmath<float> > >, detray::actor_chain<std::tuple, detray::pathlimit_aborter, detray::parameter_transporter<detray::cmath<float> >, traccc::interaction_register<detray::pointwise_material_interactor<detray::cmath<float> > >, detray::pointwise_material_interactor<detray::cmath<float> >, traccc::ckf_aborter> >::propagate_sync<detray::propagator<detray::rk_stepper<covfie::field_view<covfie::backend::constant<covfie::vector::vector_d<float, 3ul>, covfie::vector::vector_d<float, 3ul> > >, detray::cmath<float>, detray::constrained_step<detray::darray>, detray::stepper_rk_policy, detray::stepping::void_inspector, detray::darray>, detray::navigator<detray::detector<detray::default_metadata, detray::container_types<vecmem::device_vector, detray::tuple, detray::darray, vecmem::jagged_device_vector, detray::dmap> > const, detray::navigation::void_inspector, detray::intersection2D<detray::surface_descriptor<detray::detail::typed_index<detray::default_metadata::mask_ids, unsigned int, unsigned int, 4026531840u, 268435455u>, detray::detail::typed_index<detray::default_metadata::material_ids, unsigned int, unsigned int, 4026531840u, 268435455u>, unsigned int, unsigned short>, detray::cmath<float> > >, detray::actor_chain<std::tuple, detray::pathlimit_aborter, detray::parameter_transporter<detray::cmath<float> >, traccc::interaction_register<detray::pointwise_material_interactor<detray::cmath<float> > >, detray::pointwise_material_interactor<detray::cmath<float> >, traccc::ckf_aborter> >::state, std::tuple<detray::pathlimit_aborter::state&, detray::parameter_transporter<detray::cmath<float> >::state&, traccc::interaction_register<detray::pointwise_material_interactor<detray::cmath<float> > >::state&, detray::pointwise_material_interactor<detray::cmath<float> >::state&, traccc::ckf_aborter::state&> >(detray::propagator<detray::rk_stepper<covfie::field_view<covfie::backend::constant<covfie::vector::vector_d<float, 3ul>, covfie::vector::vector_d<float, 3ul> > >, detray::cmath<float>, detray::constrained_step<detray::darray>, detray::stepper_rk_policy, detray::stepping::void_inspector, detray::darray>, detray::navigator<detray::detector<detray::default_metadata, detray::container_types<vecmem::device_vector, detray::tuple, detray::darray, vecmem::jagged_device_vector, detray::dmap> > const, detray::navigation::void_inspector, detray::intersection2D<detray::surface_descriptor<detray::detail::typed_index<detray::default_metadata::mask_ids, unsigned int, unsigned int, 4026531840u, 268435455u>, detray::detail::typed_index<detray::default_metadata::material_ids, unsigned int, unsigned int, 4026531840u, 268435455u>, unsigned int, unsigned short>, detray::cmath<float> > >, detray::actor_chain<std::tuple, detray::pathlimit_aborter, detray::parameter_transporter<detray::cmath<float> >, traccc::interaction_register<detray::pointwise_material_interactor<detray::cmath<float> > >, detray::pointwise_material_interactor<detray::cmath<float> >, traccc::ckf_aborter> >::state&, std::tuple<detray::pathlimit_aborter::state&, detray::parameter_transporter<detray::cmath<float> >::state&, traccc::interaction_register<detray::pointwise_material_interactor<detray::cmath<float> > >::state&, detray::pointwise_material_interactor<detray::cmath<float> >::state&, traccc::ckf_aborter::state&>&&) ()
#3 0x00007fff2f7d4bd0 in void traccc::cuda::kernels::propagate_to_next_surface<detray::propagator<detray::rk_stepper<covfie::field_view<covfie::backend::constant<covfie::vector::vector_d<float, 3ul>, covfie::vector::vector_d<float, 3ul> > >, detray::cmath<float>, detray::constrained_step<detray::darray>, detray::stepper_rk_policy, detray::stepping::void_inspector, detray::darray>, detray::navigator<detray::detector<detray::default_metadata, detray::container_types<vecmem::device_vector, detray::tuple, detray::darray, vecmem::jagged_device_vector, detray::dmap> > const, detray::navigation::void_inspector, detray::intersection2D<detray::surface_descriptor<detray::detail::typed_index<detray::default_metadata::mask_ids, unsigned int, unsigned int, 4026531840u, 268435455u>, detray::detail::typed_index<detray::default_metadata::material_ids, unsigned int, unsigned int, 4026531840u, 268435455u>, unsigned int, unsigned short>, detray::cmath<float> > >, detray::actor_chain<std::tuple, detray::pathlimit_aborter, detray::parameter_transporter<detray::cmath<float> >, traccc::interaction_register<detray::pointwise_material_interactor<detray::cmath<float> > >, detray::pointwise_material_interactor<detray::cmath<float> >, traccc::ckf_aborter> >, covfie::field_view<covfie::backend::constant<covfie::vector::vector_d<float, 3ul>, covfie::vector::vector_d<float, 3ul> > >, traccc::finding_config<float> >(traccc::finding_config<float>, detray::propagator<detray::rk_stepper<covfie::field_view<covfie::backend::constant<covfie::vector::vector_d<float, 3ul>, covfie::vector::vector_d<float, 3ul> > >, detray::cmath<float>, detray::constrained_step<detray::darray>, detray::stepper_rk_policy, detray::stepping::void_inspector, detray::darray>, detray::navigator<detray::detector<detray::default_metadata, detray::container_types<vecmem::device_vector, detray::tuple, detray::darray, vecmem::jagged_device_vector, detray::dmap> > const, detray::navigation::void_inspector, detray::intersection2D<detray::surface_descriptor<detray::detail::typed_index<detray::default_metadata::mask_ids, unsigned int, unsigned int, 4026531840u, 268435455u>, detray::detail::typed_index<detray::default_metadata::material_ids, unsigned int, unsigned int, 4026531840u, 268435455u>, unsigned int, unsigned short>, detray::cmath<float> > >, detray::actor_chain<std::tuple, detray::pathlimit_aborter, detray::parameter_transporter<detray::cmath<float> >, traccc::interaction_register<detray::pointwise_material_interactor<detray::cmath<float> > >, detray::pointwise_material_interactor<detray::cmath<float> >, traccc::ckf_aborter> >::detector_type::view_type, covfie::field_view<covfie::backend::constant<covfie::vector::vector_d<float, 3ul>, covfie::vector::vector_d<float, 3ul> > >, vecmem::data::jagged_vector_view<detray::propagator<detray::rk_stepper<covfie::field_view<covfie::backend::constant<covfie::vector::vector_d<float, 3ul>, covfie::vector::vector_d<float, 3ul> > >, detray::cmath<float>, detray::constrained_step<detray::darray>, detray::stepper_rk_policy, detray::stepping::void_inspector, detray::darray>, detray::navigator<detray::detector<detray::default_metadata, detray::container_types<vecmem::device_vector, detray::tuple, detray::darray, vecmem::jagged_device_vector, detray::dmap> > const, detray::navigation::void_inspector, detray::intersection2D<detray::surface_descriptor<detray::detail::typed_index<detray::default_metadata::mask_ids, unsigned int, unsigned int, 4026531840u, 268435455u>, detray::detail::typed_index<detray::default_metadata::material_ids, unsigned int, unsigned int, 4026531840u, 268435455u>, unsigned int, unsigned short>, detray::cmath<float> > >, detray::actor_chain<std::tuple, detray::pathlimit_aborter, detray::parameter_transporter<detray::cmath<float> >, traccc::interaction_register<detray::pointwise_material_interactor<detray::cmath<float> > >, detray::pointwise_material_interactor<detray::cmath<float> >, traccc::ckf_aborter> >::intersection_type>, vecmem::data::vector_view<detray::bound_track_parameters<detray::cmath<float> > const>, vecmem::data::vector_view<traccc::candidate_link const>, unsigned int, unsigned int const&, vecmem::data::vector_view<detray::bound_track_parameters<detray::cmath<float> > >, vecmem::data::vector_view<unsigned int>, vecmem::data::vector_view<thrust::pair<int, unsigned int> >, vecmem::data::vector_view<unsigned int>, unsigned int&)<<<(111,1,1),(64,1,1)>>> ()
(cuda-gdb)
But I'll open a separate ticket about that one, once I understood a bit better what is going on there.
So... :thinking: By doing
diff --git a/device/common/include/traccc/clusterization/device/ccl_kernel.hpp b/device/common/include/traccc/clusterization/device/ccl_kernel.hpp
index a61ba7a4..2720bab9 100644
--- a/device/common/include/traccc/clusterization/device/ccl_kernel.hpp
+++ b/device/common/include/traccc/clusterization/device/ccl_kernel.hpp
@@ -29,7 +29,7 @@ namespace details {
using index_t = unsigned short;
static constexpr int TARGET_CELLS_PER_THREAD = 8;
-static constexpr int MAX_CELLS_PER_THREAD = 12;
+static constexpr int MAX_CELLS_PER_THREAD = 32;
/// Helper struct for calculating some of the input parameters of @c ccl_kernel
struct ccl_kernel_helper {
, I can make CUDA process the $\mu$ = 300 ttbar events as well. As long as I don't enable assertions...
[bash][Legolas]:traccc > ./out/build/sycl/bin/traccc_seq_example_cuda --detector-file=geometries/odd/odd-detray_geometry_detray.json --grid-file=geometries/odd/odd-detray_surface_grids_detray.json --use-detray-detector --digitization-file=geometries/odd/odd-digi-geometric-config.json --input-directory=odd/geant4_ttbar_mu300/ --input-events=10 --input-skip=0
Running Full Tracking Chain Using CUDA
>>> Detector Options <<<
Detector file : geometries/odd/odd-detray_geometry_detray.json
Material file :
Surface rid file : geometries/odd/odd-detray_surface_grids_detray.json
Use detray::detector: yes
Digitization file : geometries/odd/odd-digi-geometric-config.json
>>> Input Data Options <<<
Input data format : csv
Input directory : odd/geant4_ttbar_mu300/
Number of input events : 10
Number of input events to skip: 0
>>> Clusterization Options <<<
Target cells per partition: 1024
>>> Track Seeding Options <<<
None
>>> Track Finding Options <<<
Track candidates range : 3:100
Minimum step length for the next surface: 0.5 [mm]
Maximum step counts for the next surface: 100
Maximum Chi2 : 30
Maximum branches per step: 4294967295
Maximum number of skipped steps per candidates: 3
>>> Track Propagation Options <<<
Constraint step size : 3.40282e+38 [mm]
Overstep tolerance : -100 [um]
Minimum mask tolerance: 1e-05 [mm]
Maximum mask tolerance: 1 [mm]
Search window : 0 x 0
Runge-Kutta tolerance : 0.0001
>>> Performance Measurement Options <<<
Run performance checks: no
>>> Accelerator Options <<<
Compare with CPU results: no
WARNING: No material in detector
WARNING: No entries in volume finder
Detector check: OK
WARNING: No material in detector
WARNING: No entries in volume finder
Detector check: OK
WARNING: @traccc::io::csv::read_cells: 25530 duplicate cells found in /data/ssd-1tb/projects/traccc/traccc/data/odd/geant4_ttbar_mu300/event000000000-cells.csv
WARNING: @traccc::io::csv::read_cells: 31945 duplicate cells found in /data/ssd-1tb/projects/traccc/traccc/data/odd/geant4_ttbar_mu300/event000000001-cells.csv
WARNING: @traccc::io::csv::read_cells: 26973 duplicate cells found in /data/ssd-1tb/projects/traccc/traccc/data/odd/geant4_ttbar_mu300/event000000002-cells.csv
WARNING: @traccc::io::csv::read_cells: 27856 duplicate cells found in /data/ssd-1tb/projects/traccc/traccc/data/odd/geant4_ttbar_mu300/event000000003-cells.csv
WARNING: @traccc::io::csv::read_cells: 24369 duplicate cells found in /data/ssd-1tb/projects/traccc/traccc/data/odd/geant4_ttbar_mu300/event000000004-cells.csv
WARNING: @traccc::io::csv::read_cells: 26512 duplicate cells found in /data/ssd-1tb/projects/traccc/traccc/data/odd/geant4_ttbar_mu300/event000000005-cells.csv
WARNING: @traccc::io::csv::read_cells: 27034 duplicate cells found in /data/ssd-1tb/projects/traccc/traccc/data/odd/geant4_ttbar_mu300/event000000006-cells.csv
WARNING: @traccc::io::csv::read_cells: 25225 duplicate cells found in /data/ssd-1tb/projects/traccc/traccc/data/odd/geant4_ttbar_mu300/event000000007-cells.csv
WARNING: @traccc::io::csv::read_cells: 25553 duplicate cells found in /data/ssd-1tb/projects/traccc/traccc/data/odd/geant4_ttbar_mu300/event000000008-cells.csv
WARNING: @traccc::io::csv::read_cells: 22854 duplicate cells found in /data/ssd-1tb/projects/traccc/traccc/data/odd/geant4_ttbar_mu300/event000000009-cells.csv
==> Statistics ...
- read 5572599 cells from 179427 modules
- created (cpu) 0 measurements
- created (cuda) 1517631 measurements
- created (cpu) 0 spacepoints
- created (cuda) 1517631 spacepoints
- created (cpu) 0 seeds
- created (cuda) 372048 seeds
- found (cpu) 0 tracks
- found (cuda) 693397 tracks
- fitted (cpu) 0 tracks
- fitted (cuda) 693397 tracks
==>Elapsed times...
File reading (cpu) 9997 ms
Clusterization (cuda) 76 ms
Spacepoint formation (cuda) 3 ms
Seeding (cuda) 109 ms
Track params (cuda) 2 ms
Track finding (cuda) 2046 ms
Track fitting (cuda) 551 ms
Wall time 13038 ms
[bash][Legolas]:traccc >
It's not great that we have such a fragile hardcoded maximum in our code, but for today this will have to do...
Yes, if there are too many cells before a viable partition point, then the code is expected to crash. The envisioned way out here is to do some on-device memory allocation (or to use some global memory scratch space) to process the excess cells, but this is not currently implemented as the probability of failure is astronomically small (this for a maximum partition size of 2048):
I doubt that's what's going on here, though. Unless the occupancy here is much higher than 1%.
The ODD ttbar simulation just keeps on giving...
While I can process the low-ish $\mu$ samples successfully, at and above $\mu$ = 100 I run into:
There seem to be multiple things going wrong actually... :thinking:
So https://github.com/acts-project/traccc/blob/main/device/common/include/traccc/clusterization/device/impl/ccl_kernel.ipp#L153-L197 makes some mistake with setting up the
partition_start
andpartition_end
values. :thinking:https://github.com/acts-project/traccc/blob/main/device/cuda/src/clusterization/clusterization_algorithm.cu#L80-L100
The silly
cell_links
buffer is allowed to be destroyed before the (asynchronous) kernel would stop running. :frowning:The latter one is fully on me, I was the one who introduced this behaviour in #545, mea culpa. But the first one I could do some help / insights with.
Pinging @beomki-yeo for info. :wink: