This adds the first version of the datacard validation tools - so far it has been tested on the 2016 VH(bb) datacards. The full set of checks takes around 10 minutes to run on those cards, but this is driven by a shape comparison check; without that check it takes less than a minute.
The implementation consists of C++ functions to perform a set of checks:
Do the up and down templates of a shape uncertainty change the norm such that in one case it increases and in the other it decreases, or is the normalisation change in the same direction?
Are there processes with 0 yield in the datacard?
Are there uncertainties where the up and or down templates have 0 yield even though the nominal yield is nonzero?
Are there uncertainties which have a normalisation effect larger than some percentage specified by the user?
Do the shape uncertainties have a real shape effect (up/down templates different from nominal - defined by comparing the squared difference between bin content in the normalised nominal and up/down templates, summed over all bins)?
Two versions of each function are provided: one that writes the output to a json file, another that writes output to the screen. The second version is provided such that users who use CombineHarvester to create their datacards have the option of running some or all of the checks within their datacard creation setup. The versions of the functions that write output to a json file are essentially internal to an overarching datacard validation function which runs all the checks. This is then called from a script, ValidateDatacards.py, which writes output to a json file and reads it back in to report to the user how many warnings of each type occurred, and, depending on the level of detailed printing specified, for which processes and uncertainties in which bins the problems occurred.
The script also has a read-only mode which can be used if the validation checks were already run.
We may want to add other checks in future versions, these should be straightforward to integrate.
Finally, before merging and advertising more widely, the shape comparison test might need some refining. In addition the tools should be tested on more datacards, for example all inputs to the grand combination.
This adds the first version of the datacard validation tools - so far it has been tested on the 2016 VH(bb) datacards. The full set of checks takes around 10 minutes to run on those cards, but this is driven by a shape comparison check; without that check it takes less than a minute.
The implementation consists of C++ functions to perform a set of checks:
Two versions of each function are provided: one that writes the output to a json file, another that writes output to the screen. The second version is provided such that users who use CombineHarvester to create their datacards have the option of running some or all of the checks within their datacard creation setup. The versions of the functions that write output to a json file are essentially internal to an overarching datacard validation function which runs all the checks. This is then called from a script,
ValidateDatacards.py
, which writes output to a json file and reads it back in to report to the user how many warnings of each type occurred, and, depending on the level of detailed printing specified, for which processes and uncertainties in which bins the problems occurred. The script also has a read-only mode which can be used if the validation checks were already run.We may want to add other checks in future versions, these should be straightforward to integrate. Finally, before merging and advertising more widely, the shape comparison test might need some refining. In addition the tools should be tested on more datacards, for example all inputs to the grand combination.