Validation of old vs new

kreczko commented 7 years ago

We need to start comparing the outputs of old and new. Ideally, we would have a publicly available, small (dummy) L1NTuple file that we could run together with the travis tests. For the moment we can pick one fairly up-to-date file and put something together that would work locally, e.g. by taking the ROOT outputs of Shane's scripts and comparing histogram-by-histogram to the new.

kreczko commented 7 years ago

Looking now at this issue, I am thinking of adding two analysers:

legacy_analyzer
- adjusts the ntuple_cfg
- compiles macros
- runs & summarises ROOT files
- does not use the event loop? Can you think of other use cases for such an analyser?
validation analyzer
- takes two ROOT files + mapping JSON/YAML
- produces diff and ratio plots for each given histogram

The validation analyser will be useful for other things, but the legacy analyser will not (at least in the planned form). I do see an application to have an analyser that takes C++ code and uses that to process events. But that's more work than needed atm.

kreczko commented 7 years ago

To expand on the validation analyzer: the following functionality needs to be present

compare two identical (ROOT paths & histogram names), existing files
compare two different existing files using a map file in JSON/YAML format
only compare a subset using a filter file (different to mapfile, see below)
validate current analysis output against a reference

Mapfile should describe the internal mapping of a path and/or ROOT file and is meant to be fairly static. It is not needed for identical files. The filter file, on the other hand, should only contain objects of interest and is expected to change more frequently (e.g. someone debugging a certain set of distributions.

For this, I propose a new section in the config, validation since it does not require looping over events:

---
validation:
  input: <list of input files OR output of current running ("analysis"?)>
  reference: <list of reference files, order aligned with "input">
  filter: <path to filter file, optional>
  mapping: <path to mapping file, optional>
# uses the output section

The output of the validation module will be (r = reference, c = current):

d = 1/n_bins * chi(r,c)^2 histogram with entries between -1 and 1
score = 1/n_distr * sum(1 - d_i ) * 100
no PDF or PNG versions of the histograms → plottter

Example mapfile:

---
path/to/my/input_hist:/path/to/corresponding/
...

Example filter file:

include:
  - list of inputs to include
exclude:
  - list of inputs to exclude

Obviously, his might change once we actually put the module into action.

cms-l1t-offline / cms-l1t-analysis

Validation of old vs new #6