Decide which code branch to use for final rounds of processing

SFBayLaser commented 3 weeks ago

The data and nominal MC for Neutrino '24 was processed with icaruscode v09_84_00_03 while more recent detector variations utilized icaruscode v09_89_01_01p02.

We need to decide which of these two code versions to use as the branch point for the final effort.

SFBayLaser commented 3 weeks ago

It is important to note that v09_84_00_03 canot be used with the SPINE reconstruction has it has the wrong truth labels. SPINE training occurred with v09_89_01_01_p02.

SFBayLaser commented 3 weeks ago

Will we need to branch SBN level repositories as well?

SFBayLaser commented 3 weeks ago

Pros of changing nominal MC version:

Guaranteed match between nominal and detector variations
Opportunity to update some choices (ie: tail response, electron lifetime, others?)

Cons of changing nominal MC version:

Delay introduced by re-analysis (likely large)
Opportunity to update some choices could lead to larger changes relative to existing analysis and further debate as to what to include or not
Requires reprocessing of data (v09_89 in progress but not complete) (from Promita: Run2 reprocessing is being done with v09_89_01. 3 datastreams are complete, 2 are running and rest 3 are not yet run (kept on hold) because the disk space is not in good situation (free sapce ~68TB)

cfarnese commented 3 weeks ago

@SFBayLaser I checked at CNAF and the release used was v09_84_00_01. I suppose this is not so relevant but in this way we can preserve in an addtional place the info! Other question: are the variations using v09_89_01_01p02? I understood that the we were using v09_89_01_01 and I submit the request for the new CV with this version... @jzettle can you confirm?

Looking at the pros and cons: in the pros I see the opportunity to update some choices (ie: tail response, electron lifetime, others?) but this means that even the CV recently produced by Promita becomes useless...

@francois-drielsma Can we have more details about the problem with SPINE? or just to be more direct: which is the impact on the analysis made by Justin until now? There is for example an impact on the evaluation of the efficiency? this maybe is an answer you have already provided in the ML working group but unfortunately (my fault) I am not always able to be present to that meeting... At the moment I am more concentrated on what we should do with the 1muNp analysis while for the other analysis I agree we can consider some changes with respect to the big productions already done...

jzettle commented 3 weeks ago

I always requested just v09_89_01_01 in any production myself. I don't clearly see any differences with v09_89_01_01p02 that matter for the variations but I might be missing details

francois-drielsma commented 3 weeks ago

@jzettle from the diff (https://github.com/SBNSoftware/icaruscode/compare/v09_89_01_01...v09_89_01_01p02), it seems the following two changes occurred:

Moved from icarusalg v09_88_00_02 to icarusalg v09_89_01_01
- This does not seem to do anything (https://github.com/SBNSoftware/icarusalg/compare/v09_89_01_01...v09_88_00_02)
Updated NuMI flux files (important for NuMI productions only)

Bottom line: does not matter for BNB in anyway as far as I can tell

francois-drielsma commented 3 weeks ago

@cfarnese I will again strongly challenge the recurring resource con argument. The existing "nominal" CNAF sample is not a big production by any definition. 330k is a small amount of statistics by any metric and is something that can (and has) been produced in a few days. Please recognize that the amount of resources needed to produce this nominal sample will be dwarfed by the amount required to produce O(10) detector variations, each of which contains 200k nu + cosmics event (totaling > 2M events). This is not a good argument to use as a basis to choose between making a new nominal sample or not, I'd like to avoid us muddying the waters of this conversation with it.

The only argument I am willing to hear on this issue is our understanding of the v09_84 sample w.r.t the v09_89 sample. My personal disconnect here is that you (and others) are making (in my view) two contradictory statements:

"We have studied the v09_89 variation samples sufficiently to demonstrate that it is valid to use v09_89 to evaluate detector systematics on a nominal sample that has been produced with v09_84"
"We have not studied v09_89 sufficiency to transition the nominal sample from v09_84 to v09_89"

What are the validations that need to be done on v09_89 that are specific to the nominal sample and are not required to validate that v09_89 is a valid release to evaluate detector variations with? Do we agree at least philosophically that it is very bad practice to mix software releases of simulations when producing a physics analysis? Two scenarios:

The releases are definitely equivalent: why not make the switch? This will put any concern of discrepancy between releases to bed
The releases are definitely not equivalent: you absolutely cannot mix the versions together when producing physics results

The details of how versioning affects the SPINE analysis are almost irrelevant in this context. Harmonizing the software release is simply the right thing to do and something that should have been done months ago. Given the urgency of Neutrino, there was an argument to go with what we had, but the urgency argument cannot possibly remain relevant on ICARUS ad vitam aeternam.

To answer your question though, here are the other upsides of switching to the more recent release:

Fix the lifetime in all simulations (nominal and variations) to the correct number. The current nominal sample uses 3 ms, which is definitely wrong for run 2. We could do 8 ms if we go W only, 6 ms if we take the average of the two. This will in turn reduce the impact of the lifetime detector systematics on the measurement.
Fix the signal response in all simulations (I understand it is a small effect, but it is a freebie we should take)
Fix the shower labeling for SPINE. This issue manifests itself in the truth merging tracks with their daughters (deltas and michels merged into tracks) and the lack of granular information in the shower development (only information about the primary fragment is stored). If we fix this now, it will allow us to use this sample for other analyses (michel, nue and pi0) and will ensure continuity with future SBN-wide analyses.
If we get it right now, we will not need to reproduce a full suite of detector variations (which is not a negligible amount of work) in the near future for other BNB analyses (nue selection coming soon, pi0 too)

mrmooney commented 3 weeks ago

Since GitHub is now a repository of opinions in addition to code:

I agree with the entirety of Francois' comment directly above. I feel that these points have been made before, and no sufficient response given to them. Instead, it has been stated, inexplicably, that we would "hold up the analysis for six months" by going down this path, which I simply don't understand. If it does hold up the analysis, we should be glad that we did not put out an erroneous result by virtue of going down this route.

Also, for the record, I am among those that would like for this result to go out as quickly as possible. To me, this means as quickly as we can be sure we are putting out something that is correct.

cfarnese commented 3 weeks ago

@francois-drielsma Concerning the production: I agree that we have so many variations to produce that enlarge the CV is not a problem.. but what about the data, that are presently not fully available? Are we saying that we compare data with 84 and MC with 89? Please note that I never declared that "We have not studied v09_89 sufficiency to transition the nominal sample from v09_84 to v09_89" but I simply think that there is a consistency in the approach to make the data/MC comparison with the 84 and to estimate the detector systematic errors from the 89, as was decided months ago, because the variations are referred to a CV that is made with exactly the same code and we understand (with the error I discover) that now the CV of the 84 and 89 are essentially compatible... What I also said is that if we decided to go to 89 for the data/MC comparison, It is appropriate to repeat the detailed checks that has been done in THIS kind of work (I mean the comparison between data and MC, all the plots for example made by Maria and Justin) that probably will provide very similar or even identical results but I don't think is reasonable to skip if you move everything to 89... If we need to do so it means we need to wait for the readiness of the data (and that means that in addition to the big variations and the increase of the CV you need to complete the data, requiring additional time) and I suppose people will say that since we consider a new production we have to restart from the 10% and wait for the passage to the 30% of the control sample (that we have already done in the present status)... this is even also true if you are changing the kind of CV that you are using, changing for example the electron lifetime or the signal shape...

I want also to mention that the production of the most recent CV I requested has started on Oct 10 and on Oct 17 we have 190k events... If you want to change the CV this should be restarted requiring additional days...

Keep in mind I repeat that my opinion is strictly related to 1muNp analysis where we are really progressing rapidly in the direction of the next blinding policy step... I understand in any case my opinion is unpopular but I think it is also in the interest of the analyzers that have already made a lot of work and should restart when they are approaching the finish line...

cfarnese commented 3 weeks ago

Just a final couple of comments: my idea to continue to use the data/MC comparison with 84 doesn’t mean to stop any other study different from 1muNp… I strongly support the idea to enlarge the CV that is presently under production by Promita, to be used for all the other studies with the correction for the truth matching required for the ML analysis to allow progress on all the very important studies ongoing for example the nue… it was really unfortunate we have found this bug in the way we made productions but let me say that at least for the 1muNp, this bug is affecting only the detector systematic evaluation but this experience has already demonstrated that some of them are providing small contributions and maybe are not so urgent to reproduce them for the moment, we can skip to reproduce them, as mentioned at the CM… concerning instead the change of the CV, it seems to me (also from the studies we have performed) the impact of the mentioned changes are relatively small (with respect to the CV we are presently producing, the impact on the signal shape is a small increase on hit efficiency only in ind2 and the change of purity has a small impact as seen in the variation with the nu only sample) so in my (unpopular) opinion We can maybe think to introduce these changes in a future round of production… in any case this is only my opinion we can better discuss in more proper ways and involving also others…

francois-drielsma commented 3 weeks ago

@cfarnese Thank you for addressing my comments.

On the data reprocessing: @PromitaRoy tells me that the BNB data (on-beam and off-beam) have been 50% reprocessed with v09_89 and that it would take little time to finish the production. The only reason it was held up was due to disk space, but if it is deemed a priority (which I believe it is), she will release the campaign and the files will be produced quickly. Could you clarify which version of the data processing you are using for the BNB on-beam and off-beam datasets currently? Are you using both and if so have they both been processed with v09_84?

I feel that I am still not getting a straight answer on the question I asked in my previous message. I understand that the detector systematics are estimated w.r.t. a CV sample that is produced with v09_89. What I do not understand is how you can trust that using v09_89 to evaluate systematics on a nominal sample produced with v09_84 is a valid approach? How are you validating that both versions are equivalent enough that you can use one version to estimate detector systematics on another? Again, either they are equivalent and we should harmonize the analysis or they are not and we cannot possibly mix software releases.

I would also like to understand the urgency to get this paper published. Either this paper is presenting meaningful physics results and we should do it right (not mix software releases and put our best foot forward in terms of systematics) or it is not meaningful physics in which case: why are we trying to push it out in the next 2 months, or push it out at all for that matter? As usual, "do it right or don't do it at all" is my stance on this matter.

As for the time it takes to reproduce the CV, were we to change the setting: if it indeed takes 1 week to create 200k events, it will take >10 weeks (2.5 months) to do all detector variations. The point I made in my previous message still stands: either we do not have time to create the detector variations within the absurd timeline we have set for this paper to be published, or we definitely have time to recreate a CV sample, as it represents a small fraction of the overall statistics we have to produce for it.

Let's talk about this during the 10AM CT meeting tomorrow.

cfarnese commented 3 weeks ago

Thanks @francois-drielsma for your reply! I try to answer to your questions!

First of all the Offbeam data: in the analysis Maria is doing, these events have been processed by Promita with the 89 version... this sample has been used to demonstrate that the contribution of this kind of cosmics is about 0.3% in the 1muNp sample so let me say it is really negligible...

Concerning the idea of the detector systematics with the 89: I think that here the point is the way in which we are calculating the systematics! We are estimating them looking at the effect when you change a parameter with respect of the CV... the fact that we saw that the CV are almost identical in 84 and 89 guarantees (as a first approximation) that the systematics variation seen in the 89 should be similar in 84... this was the approach that was discussed and agreed at the time of the Neutrino conference...

Finally concerning the timing of the variations production: if I understand correctly, the time to produce the variations should be smaller with respect to the time to produce the CV because part of the production has been already made (you are not repeating the g4 stage or at least part of it)... so hopefully it will take less that 10 weeks to produce them... I think here the problem in general are the resources: we should try to optimize the productions in order to be able to pursue all our analysis, completing the ones that are already advanced and providing the proper materials for the all the analyzers...

Concerning the paper, I hope this will be an interesting results, we are working a lot for this!! I hope also soon the management can decide in which directions we should move!! For the moment thanks for the chat and sorry, I know my english is not always so good so I hope I express myself in a proper and clear way...

Last but not least: it is not clear to me: what do you mean by "Let's talk about this during the 10AM CT meeting tomorrow.", at which meeting are your referring to?

etworcester commented 3 weeks ago

Summary of discussion this morning: unless showstoppers arise (in either direction) we follow "scenario 3" in Daniele's document, which in brief, moves all data and MC analysis (nominal and systematic variations) to v89, where v89 is branched to allow required functionality. But, no content changes will be included, ie keep same electron lifetime and tail response. In principle this minimizes extra burden on analyzers as the new data/MC should be the same as was shown at Neutrino and we do not have to repeat approval steps if this is demonstrated to be true.

jzettle commented 3 weeks ago

Would it be possible to share the document from this morning with at least the group from the meeting, so we can have access to the different scenarios?

SFBayLaser commented 3 weeks ago

Scenario_Oct2124.pdf

SFBayLaser commented 3 weeks ago

I've included the document from the meeting of October 21 above.

This is the message from Daniele:

Hi all,

I attach the file shown yesterday during the meeting.

No objections were raised against the principles exposed in the file. Point 3 and 4 illustrate the goals of the numu oscillation analysis for the online and in person workshops respectively.

It was decided to extend the second scenario and adopt the third one which is using the version v_09_89 of the code. All tests performed up to now make us confident that no significant difference in the analysis should show up w.r.t. to the version v_09_84 used so far for the Data/MC comparison. For this reason, it was decided that, unless we discover unexpected problem, for the numu analysis it is not required to restart the unblinding procedure from 10% of the Run2 data and then with 30% for the control sample. The completion of this scenario involves some steps still not fully defined timewise which could prevent reaching the point 3 and then 4. If this scenario will not be completed in time for the workshop, it will still be possible to backtrack to second scenario. It is important to accurately plan the sequence of necessary productions in order to maximize the probability of a timely achievement of the goals for the workshops by following the sequence for the production as indicated for the scenario 2.

Best, Daniele

SFBayLaser commented 3 weeks ago

The agreement is to use v09_89_01_01 branch (the release/SBN2024A) which is currently at v09_89_01_01p02.

@cfarnese and Maria have vetted data reconstruction of v09_84 vs v09_89 with no differences found, the agreement is to lower the priority of reprocessing data as current studies can proceed with the v09_84 already made.

SBNSoftware / icaruscode

Decide which code branch to use for final rounds of processing #763