elaird / supy

analyze events stored in TTrees in parallel
8 stars 7 forks source link

Duplicate Events Bug #156

Closed betchart closed 11 years ago

betchart commented 11 years ago

I noticed while running over multiple versions of an analysis for evaluation of systematic uncertainties that I didn't get the same number of events in each version. Upon investigation, there were duplicate events, and the number of duplicates depended on the number of slices, but randomly, not functionally.

After burning one day to track it down, I found the problem: in the case that the last file in the chain has zero events, we were calling LoadTree for an invalid event number, so the last tree (with zero events) fails to load, and instead we loop over the second to last tree again. Actually, you could loop over that last non-empty tree as many times as there are empty trees after it.

I investigated and found that all of the simulation ttrees contain a non-zero number of events, while a significant number of (0-10%) of data ttrees contain zero events. Those jobs with zero ttree events have non-zero number of lumis in the lumiTree. Presumably these lumisections are not certified anyway, which is why I might not have noticed this problem before, since I had put a JSON lumi-filter in CMSSW.

The fix is trivial. You may want to think about whether you have had empty ttrees in your inputs.

gerbaudo commented 11 years ago

Thanks!

elaird commented 11 years ago

Hi Burt,

Thanks for tracking it down. I wrote a test [1], but was unable to reproduce the duplicates problem. Could you expand the test, or add a different one?

Ted

[1] https://github.com/elaird/supy/pull/157

betchart commented 11 years ago

I forced it to fail in the same way I was seeing just by adding a call to to GetEntries() prior to looping. That is a pretty flaky way to only break sometimes, if you ask me. See the other pull request.

elaird commented 11 years ago

Curious. I'll merge now.