cms-analysis / CombineHarvester

CMSSW package for the creation, editing and analysis of combine datacards and workspaces
cms-analysis.github.io/CombineHarvester/
15 stars 180 forks source link

Datacard validation tools #188

Closed adewit closed 5 years ago

adewit commented 5 years ago

This adds the first version of the datacard validation tools - so far it has been tested on the 2016 VH(bb) datacards. The full set of checks takes around 10 minutes to run on those cards, but this is driven by a shape comparison check; without that check it takes less than a minute.

The implementation consists of C++ functions to perform a set of checks:

Two versions of each function are provided: one that writes the output to a json file, another that writes output to the screen. The second version is provided such that users who use CombineHarvester to create their datacards have the option of running some or all of the checks within their datacard creation setup. The versions of the functions that write output to a json file are essentially internal to an overarching datacard validation function which runs all the checks. This is then called from a script, ValidateDatacards.py, which writes output to a json file and reads it back in to report to the user how many warnings of each type occurred, and, depending on the level of detailed printing specified, for which processes and uncertainties in which bins the problems occurred. The script also has a read-only mode which can be used if the validation checks were already run.

We may want to add other checks in future versions, these should be straightforward to integrate. Finally, before merging and advertising more widely, the shape comparison test might need some refining. In addition the tools should be tested on more datacards, for example all inputs to the grand combination.