JeffersonLab / halld_recon

Reconstruction for the GlueX Detector
6 stars 8 forks source link

Repeated runs not giving the same results #492

Open rjones30 opened 3 years ago

rjones30 commented 3 years ago

Repeated runs with the standard libraries and plugins for the Summer 2020 raw data reconstruction campaign do not produce the same number of entries in the histograms in hd_root.root. Here are some repeated tests with the full complement of plugins (29 of them) running with the standard version_4.24.0.xml binaries and a single thread over hd_rawdata_071728_000.evio block number 336. The results from each run show the total number of entries in hd_root.root in histograms of a given type, as reported by my rootmem.py utility. All of these repeats were done under identical conditions, one worker thread, one EVIO decoding thread.

Trial 1:

ROOT object count entries memory size
TH1D 174 145399 130864
TH1I 2583 2374053 9005768
TProfile2D 270 150392 113716800
TH1F 30 20582 91312
TH2I 13172 3791153 6932110544
TH2F 55 648200 25954536
TH2D 657 87785 225815552
TTree 2 12 2000
TProfile 25 52617 346176
TH2S 96 28905 2654208
Total 17064 7299098 7309827760

Trial 2:

ROOT object count entries memory size
TH1D 174 145077 130864
TH1I 2583 2373890 9005768
TProfile2D 270 150392 113716800
TH1F 30 20582 91312
TH2I 13172 3787288 6932110544
TH2F 55 648200 25954536
TH2D 657 87628 225815552
TTree 2 12 2000
TProfile 25 52617 346176
TH2S 96 28905 2654208
Total 17064 7294591 7309827760

Trial 3:

ROOT object count entries memory size
TH1D 174 145139 130864
TH1I 2583 2373822 9005768
TProfile2D 270 150392 113716800
TH1F 30 20582 91312
TH2I 13172 3790748 6932110544
TH2F 55 648200 25954536
TH2D 657 87659 225815552
TTree 2 12 2000
TProfile 25 52617 346176
TH2S 96 28905 2654208
Total 17064 7298076 7309827760

Notice that the differences in the numbers of entries appear in histograms of all types, so there should be no difficulty finding a specific example to drill down on.

rjones30 commented 3 years ago

I narrowed down the differences in the reconstruction output between repeated identical runs over the same raw data to the way that dEdx information is being reported for hits in the CDC. The dEdx hits lists in my debug printouts are identical for the two trials for most events, but not all. A quick scan reveals that the differences are seen in the CDC dEdxHits lists, where about one track out of 100 shows some difference somewhere in the list. Using vimdiff I went through and made a detailed comparison between the two lists. This is what I see.

  1. All of the differences lie in the CDC hits lists; the FDC hits lists are all identical in this sample of 923 events.
  2. Most of the differences can be explained as insertions/deletions: some hits included in one list are not found in the other.
  3. Sometimes, but rarely, the two dE values for a hit agree but the dx values are different.
  4. Most of the time the dE and dE_amp values are similar in magnitude, but sometimes the dE value is much larger, eg. dE=9.323973e-01, dE_amp=1.788133e-06, dx=1.635120.
  5. Most of the time the dx value is reasonable, but sometimes it is too large to be realistic, eg. dE=6.554818e-06, dE_amp=5.198256e-06, dx=8494.518240.

Regarding the large values of dE, this appears to not be so unusual. Here is a plot of the distributions of dE and dE_amp values from this sample of 923 events. I have plotted the x axis on a log scale to make the difference in scales between the two parts of the distribution visible.

image image

The dE_amp distribution looks normal, but what are these dE values way up at one? They only comprise 1% of the total statistics on the dE variable, but they should be understood. Even more interesting, they do not appear in the dE_amp distribution. Looking at the debug listing, these exceptional dE values always appear at the end of the CDC hist list for their track, as if the hits list had been ordered in increasing dE, although this is clearly not the case because the ordering is not exact. The corresponding dE_amp values for the same hit look normal.