Feature/howard trees with cov rebased

brucehoward-physics commented 8 months ago

After recent updates to fix up the usage of the CovarianceMatrixTrees a bit, I am finally starting a PR to merge in the Trees paradigm to CAFAna/SBNAna. Many of the commits are from months ago, though I think they may appear as recent since I rebased based on develop and made a new branch, rather than commit directly from various releases/features where we have been using it. I believe this feature has been useful to some folks even outside of @jedori0228 and me directly, so will be nice when this can be merged even if there will be further updates :)

There is one commit here which I’ll note is a little bit unrelated to the Trees paradigm (e36cfbb426b688898033a8f33b030b8e97e5628f). This commit makes a change to UniverseOracle which enables one to use different sets of systematics.

The paradigm is as follows: most usage of CAFAna utilizes the Spectrum class or its derivatives. These are sort-of histogram objects and very useful for the CAFAna fitting, and even offer EnsembleSpectrum for handling multiverse systematics together. However, in a number of cases, a more extensible format is better, e.g.:

Having a pruned set of slices or spills, e.g., with a set of calculated Vars or even direct info from the CAFs that you want to explore for use in an un-binned format, or when you haven’t defined the binning yet. Relatedly, extensible plotting capabilities.
Making correlations or combinations of multiple variables from such a pruned set
Feeding info to external fitters/frameworks for analysis (e.g. for the use of GUNDAM for cross-section studies, or Profit for oscillation analysis)

Enter the various Tree classes being added in this PR. The user can define for example a common Cut but specify a vector of Vars to be evaluated with that cut and saved as individual entries saved into an output ROOT TTree for the user. The primary class being added is Tree which provides the basic functionality mentioned in the previous sentence. The user can use Vars, MultiVars, SpillVars, or SpillMultiVars, but they must only use one type at a time (this will change to be even more when a new variable type is added soon).

There is also a CovarianceMatrixTree class added which inherits from Tree and enables the user to produce a covariance matrix. The foreseen usage here (at least at the moment) is to make a set of SpillMultiVars, most of which are used to determine the binning for a given entry and one with N “universes” of weights for each such entry. (E.g. we have a spill with 4 objects - e.g. slices - passing criteria in my SpillMultiVar and I have 2 Vars used to determine the binning. So, 4 entries for each of these 2 Vars. Then, I have the Var I want to do the covariance for and say we have 100 universes. Then I’ll have 100 universes for slice 1, 100 for slice 2, etc.)

Then, there is the WeightsTree and its inheritors: NSigmasTree and NUniversesTree. These enable systematics to be considered, one for variables that can be treated as discrete sigma variations and one for variables treated in a “multiple universes” fashion. The NSigmas tree can currently be saved in the tree as entries which are a vector of the weights, as a TSpline3, as a TGraph, or as a TClonesArray (utilized in GUNDAM). For multiverse systematics, only a vector of the weights are able to be saved at present. One could then use these to build a covariance matrix, or one could also build SpillMultiVars so as to use the CovarianceMatrixTree (e.g. these 2 utilities have been under exploration by @jedori0228 and me for Geant4 systematics using geant4reweight). Additionally, the MergeTree function in WeightsTree enables one to join back the systematics with the rest of the considered variables from the entries, e.g. in the more standard Tree(s).

Many details are available in an old talk on doc-db (doc-32544) and I would be happy to discuss it again/with updates and further information at an upcoming SBN Analysis Frameworks meetings (tagging e.g. @PetrilloAtWork @JosiePaton). We had a preliminary discussion jointly with GUNDAM-focused analyzers and Profit developers/users a while back — though I should not that this has some things that were discussed but not everything for sure. For example, one now has the ability to use different sigma ranges for different dials (it’s controllable by a parameter in the constructor). However, this does not yet have the ability to make an n-dimensional spline.

As an example of a basic function:

… includes and other definitions of things like the loader, cuts, vars, etc.

Tree myTree ( “MyTree”,  {“kSliceCategory/i”,  “kIsSignal/i”,  “kRecoMomentum”,  “kTrueMomentum”},
               loader, {kSlcCategoryVar,  kIsSignalVar,  kRecoMomentumVar,  kTrueMomentumVar},
               kMySpillCut, kMySlcCut, kNoShift, true, true);

std::vector<std::string> systNames = { "GENIEReWeight_ICARUS_v2_multisigma_ZExpA1CCQE”};
std::vector<const ISyst*> systObjs;
systObjs.push_back( new SBNWeightSyst(systNames[0]) );
std::vector< std::pair<int,int> > systSigmas;
systSigmas.push_back( std::make_pair<int,int>(-3,3) );

NSigmasTree mySysts ( “MySigmaSysts”, systNames, loader, systObjs, systSigmas,
                      kMySpillCut, kMySlcCut, kNoShift, true, true );

loader.Go();

File *outFile = new TFile(“myFile.root”, ”recreate”);
mySysts.MergeTree( myTree );
mySysts.SaveTo( outFile->mkdir("Merged") );

brucehoward-physics commented 7 months ago

Updated those names, thanks @jacoblarkin

I noticed while doing this that there seem to be more systematics for BNB flux known in the files (at least from last year) based on the printouts than are grabbed by this function. Just checking if that's purposeful?

jacoblarkin commented 7 months ago

Thanks for adding that.

If you mean the piplus, piminus, kplus, kminus, and kzero systematics that's intentional. There is an external macro that does a PCA on those and we read in the results of that.

brucehoward-physics commented 7 months ago

OK cool!

SBNSoftware / sbnana

Feature/howard trees with cov rebased #106