Open sethrj opened 3 months ago
@esseivaju 's runs on perlmutter confirm this is an ORANGE issue:
OK, the difference was from changing from the pregenerated .org.json
to the built-in GDML generator... not clear why
@esseivaju has been doing some profiling of the along-step kernels. Long story short: the T- cases ("EM flat") are doing the same number of operations but the memory access are much more expensive.
It seems the main difference is the empty "calorimeter" volume, which is a box with all the inner boxes subtracted out: the manually constructed version doesn't have this. Even though the volume is never actually entered, it results in a logic expression for the volume that's huge, leading to a major change in "max faces" and "max intersections" which are used by the ORANGE state data:
"orange": {
"scalars": {
"max_depth": 1,
-"max_faces": 12,
-"max_intersections": 12,
-"max_logic_depth": 3,
+"max_faces": 111,
+"max_intersections": 111,
+"max_logic_depth": 2,
"tol": {
-"abs": 1e-08,
-"rel": 1e-08
+"abs": 1.5e-09,
+"rel": 1.5e-08
}
},
"sizes": {
"bih": {
"bboxes": 102,
-"inner_nodes": 0,
-"leaf_nodes": 1,
+"inner_nodes": 99,
+"leaf_nodes": 100,
"local_volume_ids": 102
},
"connectivity_records": 111,
"daughters": 0,
-"local_surface_ids": 618,
-"local_volume_ids": 305,
-"logic_ints": 87,
+"local_surface_ids": 717,
+"local_volume_ids": 301,
+"logic_ints": 45,
"real_ids": 111,
-"reals": 103,
+"reals": 104,
"rect_arrays": 0,
"simple_units": 1,
"surface_types": 111,
I'm not sure whether our priority should be (1) patch this up somehow, (2) implementing different striding patterns and padding for GPU/CPU so that we can coalesce memory accesses in SimpleUnitTracker::simple_intersect
/SimpleUnitTracker::intersect_impl
; or (3) speed up our plans for short circuiting and evaluate the boundary distances on the fly rather than storing them to effectively "local" memory.
BTW I found this using git bisect
: very glad (@pcanal ) that we use a squash merge strategy 😉
$ git bisect log
# bad: [21940769b1bfba623289ad1ebb3fe30b81d1b5e4] Update frontier toolchain and backward compatibility (#1330)
# good: [b1e52f2001c73fd887ab26dfa100480b5b146167] Add Cerenkov distribution and generator (#1080)
git bisect start 'HEAD' 'v0.5.0-dev'
# bad: [89336e59b66fad695833cba4c65fac976918e60e] Add Windows/Linux no-dependency builds (#1196)
git bisect bad 89336e59b66fad695833cba4c65fac976918e60e
# good: [18f4ae60832c06e70d24b915f0781df38474e038] Fix missing Werror in build-fast workflow (#1141)
git bisect good 18f4ae60832c06e70d24b915f0781df38474e038
# good: [4dd382ff2bcd4f1a5c4ca9a0605c8e28474b0f27] Add sense evaluator for testing (#1168)
git bisect good 4dd382ff2bcd4f1a5c4ca9a0605c8e28474b0f27
# good: [e1c7aedce274dece374ffeb8e093c1793d5b2e77] Pin sphinx at 7.2 to fix user doc build (#1188)
git bisect good e1c7aedce274dece374ffeb8e093c1793d5b2e77
# bad: [57ef806c1424e11f03299f0443fe60396e28e77d] Update esseivaj user presets (#1195)
git bisect bad 57ef806c1424e11f03299f0443fe60396e28e77d
# bad: [2be4ebb86ff1b764a04d7e931f25b78d56c963ae] Switch ORANGE unit tests to use GDML files (#1181)
git bisect bad 2be4ebb86ff1b764a04d7e931f25b78d56c963ae
# skip: [4e9676e42f064a40e9fdc4b0e949e128c2b99d1a] Define geometry traits (#1190)
git bisect skip 4e9676e42f064a40e9fdc4b0e949e128c2b99d1a
# skip: [fe2611587b4b4a1e4f6a71e33d3ebcb6e4332041] Complete GDML-to-ORANGE geometry converter (#1180)
git bisect skip fe2611587b4b4a1e4f6a71e33d3ebcb6e4332041
# only skipped commits left to test
# possible first bad commit: [2be4ebb86ff1b764a04d7e931f25b78d56c963ae] Switch ORANGE unit tests to use GDML files (#1181)
# possible first bad commit: [4e9676e42f064a40e9fdc4b0e949e128c2b99d1a] Define geometry traits (#1190)
# possible first bad commit: [fe2611587b4b4a1e4f6a71e33d3ebcb6e4332041] Complete GDML-to-ORANGE geometry converter (#1180)
Yeah the ugly length-111 cell is the "background" volume (world).
It seems that the only real slowdown is because of the less compact data layout for the temporary states: I don't think the background cell is even being reached in practice. So we really need a more optimal way (BIH/BVH traversal) for tracking across these.
The plan discussed with @elliottbiondo:
On frontier, something catastrophic has happened to the performance of ORANGE in a couple of problems. Here's the relative throughput of 0.5.0-dev.202+141cd4928 compared to v0.4.4:
This is not due to a change in the number of steps (so the amount of work is the same):
As a reminder, the problem complexity and abbreviations: and 'M̃' means MSC is disabled.