DUNE / dune-tms

DUNE ND Temporary Muon Spectrometer
0 stars 1 forks source link

Current version of spill building for nersc production #59

Open jdkio opened 7 months ago

jdkio commented 7 months ago

This code is not ready

Towards the end of the PR, I'll talk about current issues.

First, this PR builds spills from multiple TG4 events using the spill time. overlaySinglesIntoSpillsSorted.C takes multiple streams and combines them into a single file. The time of the hits is set according to the flux spill structure. The events are inserted into the output sequentially but as separate TG4Events

But dune-tms is setup to use edep sim's overlay code. Its code combines all the hits into a single TG4Event so we can process it as normal. But overlaySinglesIntoSpillsSorted end up as separate events ordered in time without this PR.

In theory, this PR fixes the issue by combining events using the TMS_Event.AddEvent functionality. It's a draft because the code is messy and broken, but I'm low on time. I was trying to fix two issues.

1) The 1.2e9 spill offset causes issues with floats. 100 + 1e9 = 1e9 when using a float. The output branches are floats and so a lot of precision is lost. Switching to doubles causes some of the crashes below. The causes aren't clear and I'll talk about them more at the end. One solution to the float issue was to subtract the spill offset from the hit time. First I tried to subtract the time from the hit by passing the spill time into the hit. This actually worked but was awkward because you're passing that through the hit and true hit. The second solution was to undo the overlaySinglesIntoSpillsSorted code before creating the TMS_Event. This seemed to work but then started crashing again. That's what's currently in the code and for some reason it's crashing, see below.

2) The second issue is the crashes described below. I'm not sure what's causing them and I've run out of time to try to fix it. I'm not sure if this is caused by adding events or what

About the crashes:

I get all sorts of crashes. Sometimes doing a full make clean helps. Most of the time it's intermittent, and literally running the code multiple times leads to different crashes, and then eventually it works. Sometimes you change something in the code, and it will just crash forever for no reason. The only thing to do is to change it back.

The current crash is:

#15 std::_Destroy<TMS_TrueParticle*> (__last=<optimized out>, __first=<optimized out>) at /cvmfs/larsoft.opensciencegrid.org/products/gcc/v9_3_0/Linux64bit+3.10-2.17/include/c++/9.3.0/bits/stl_construct.h:137
#16 std::_Destroy<TMS_TrueParticle*, TMS_TrueParticle> (__last=0xff19d40, __first=<optimized out>) at /cvmfs/larsoft.opensciencegrid.org/products/gcc/v9_3_0/Linux64bit+3.10-2.17/include/c++/9.3.0/bits/stl_construct.h:206
#17 std::vector<TMS_TrueParticle, std::allocator<TMS_TrueParticle> >::~vector (this=0x7ffdddd15d40, __in_chrg=<optimized out>) at /cvmfs/larsoft.opensciencegrid.org/products/gcc/v9_3_0/Linux64bit+3.10-2.17/include/c++/9.3.0/bits/stl_vector.h:677
#18 TMS_Event::~TMS_Event (this=0x7ffdddd15d20, __in_chrg=<optimized out>) at ../src/TMS_Event.h:19
#19 0x0000000000409592 in ConvertToTMSTree (filename=..., output_filename=...) at ConvertToTMSTree.cpp:171

But here are additional other examples:

*** Error in `ConvertToTMSTree.exe': corrupted size vs. prev_size: 0x000000000e13f6f0 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x7f474)[0x7ff84aa9b474]
/lib64/libc.so.6(+0x8156b)[0x7ff84aa9d56b]
/cvmfs/larsoft.opensciencegrid.org/products/root/v6_22_08d/Linux64bit+3.10-2.17-e20-p392-prof/lib/libRIO.so(_ZN14TFileCacheReadD1Ev+0x147)[0x7ff84edbb547]
/cvmfs/larsoft.opensciencegrid.org/products/root/v6_22_08d/Linux64bit+3.10-2.17-e20-p392-prof/lib/libTree.so(_ZN10TTreeCacheD0Ev+0x12)[0x7ff84d5430b2]
/cvmfs/larsoft.opensciencegrid.org/products/root/v6_22_08d/Linux64bit+3.10-2.17-e20-p392-prof/lib/libTree.so(_ZN5TTreeD2Ev+0x145)[0x7ff84d562245]
/cvmfs/larsoft.opensciencegrid.org/products/root/v6_22_08d/Linux64bit+3.10-2.17-e20-p392-prof/lib/libTree.so(_ZN5TTreeD0Ev+0x12)[0x7ff84d562902]
ConvertToTMSTree.exe(_Z16ConvertToTMSTreeNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEES4_+0x36c9)[0x40b2a9]
ConvertToTMSTree.exe(main+0xc6)[0x407946]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x7ff84aa3e555]
ConvertToTMSTree.exe[0x407a8d]
*** Error in `ConvertToTMSTree.exe': munmap_chunk(): invalid pointer: 0x000000000e2ac270 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x7f474)[0x7f85d6c59474]
ConvertToTMSTree.exe(_ZN9TMS_EventD1Ev+0x3a2)[0x40dc52]
ConvertToTMSTree.exe(_Z16ConvertToTMSTreeNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEES4_+0x192a)[0x40950a]
ConvertToTMSTree.exe(main+0xc6)[0x407946]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x7f85d6bfc555]
ConvertToTMSTree.exe[0x407a8d]
LiamOS commented 7 months ago

Have merged it onto liam_dev to play around with. If possible could you post the files/commands/env you were running with to get this crash?

In theory the float issue is solvable with some trickery, through renormalising things, or adding some extra bits in a clever way. We can discuss what exactly is needed sometime that suits you, whether that's soon or when you're back.

jdkio commented 7 months ago

Hopefully you can fix this before I get back in 2 months. I think we're going to want to run with a proper pileup simulation soon. Sorry for leaving it incomplete

ConvertToTMSTree.exe /dune/data/users/abooth/Postdoc/Production/MiniProdN1p2-v1r1/run-spill-build/output/MiniProdN1p2_NDLAr_1E19_RHC.spill/EDEPSIM_SPILLS/MiniProdN1p2_NDLAr_1E19_RHC.spill.00001.EDEPSIM_SPILLS.root

You may want to ditch all this code and do it another way if it keeps giving you trouble. One idea would be to run a spill building step before dune-tms which loads the edep sim file and adds the events together into a single event