SBNSoftware / sbncode

SBN Analysis Software
9 stars 27 forks source link

CAFMaker blinding random stream uses the same random numbers as the "fake" reconstruction #306

Open PetrilloAtWork opened 1 year ago

PetrilloAtWork commented 1 year ago

The blinding code introduced in #287 uses a random stream to decide whether the event is going to be kept unchanged.

The bug is that the random seed may end up being the same as the one of the other random stream (which is used for "fake" reconstruction): https://github.com/SBNSoftware/sbncode/blob/a38ad525c77ffaed381e34fab6597e3a37cbec77/sbncode/CAFMaker/CAFMaker_module.cc#L338 A different instance should be used instead. As it is, it is not even clear to me if running the job twice, the same events would be left unblinded.

Also, I would have had CAFMaker, which is an art module, rather use more standard random engine approach, that is using CLHEP engines instead of ROOT's, and taking full advantage of art services to manage those engines. This may be a topic for a separate issue, and is not about blinding alone (same holds for the simplified reconstruction).


I also personally don't love the technical approach, but that's not strictly my call. Anyway, a more "workflow-based" approach is to have the CAFMaker module write one single stream, either the blinded one or the clear one, instantiate two modules, one for each type, and steer the events to one or the other via a prescale filter. In this way, less complexity is shoved into CAFMaker.

PetrilloAtWork commented 1 year ago

"Assigned" to @etworcester mostly because she should be aware of the issue. I don't have much time, but I consider myself bound to help.

etworcester commented 1 year ago

Sorry, just saw this when Wes pinged about it - seems that I don't receive notifications when tagged here. It doesn't really matter to me what random scheme is used - I just wanted something that would work, which this does. If @PetrilloAtWork would like to change the algorithm that is also fine with me - I just want some way of generating three files with prescaled, blinded, and nominal CAF output, with the POT also adjusted accordingly, where I am sure that all events are assigned to the blind or prescaled files with the right proportions and distributed in time the same way as the full data. I did think this would be reproducible (and I think it seemed to be on my quick checks, though I didn't do a dedicated test), but personally don't think it is terribly critical whether or not this is strictly reproducible in multiple CAF productions - the idea was for CAFMaker to be run by keep-up (or in additional production passes) with analyzers using the resulting CAF files - so as long as the desired behavior exists in a given production, I think it's ok. But again, feel free to make it better if you like.

etworcester commented 1 year ago

Also, Jacob pointed out that the fake reco and the blinding will in practice not be run on the same data set (fake reco is MC only, blinding is data only) so that may mitigate the concern about the impact of re-using the random number generator

wesketchum commented 1 year ago

@PetrilloAtWork can you comment if you are happy with the reply or not?

If we think something should still be done, then my proposal would be to keep the issue open but target it for a future release, since it sounds like it may not be urgent

wesketchum commented 1 year ago

OK, very very very very very briefly summing up my understanding of the Slack discussion: