cms-analysis / HiggsAnalysis-CombinedLimit

CMS Higgs Combination toolkit.

https://cms-analysis.github.io/HiggsAnalysis-CombinedLimit/latest

Apache License 2.0

75 stars 384 forks source link

Speed up datacard parsing #787

Closed nsmith- closed 2 years ago

nsmith- commented 2 years ago

Two regular expressions were applied to all lines in card, both of which were 10x slower than doing the equivalent transformation by hand in python.

\\s*#.* removes comments and whitespace at the end of the line
(?<=\\s)-+(\\s|$) converts any - entries to 0

The latter was applied universally, whereas now it is only checked for parameter effect size arguments. This means that the interpretation of arguments for certain nuisances is now more restricted, e.g.:

unc1 unif - 1  ...
unc2 gmN -  ...

used to be equivalent to

unc1 unif 0 1  ...
unc2 gmN 0  ...

but now would raise an error on converting - to int or float. (of course for gmN you wouldn't really want 0 anyway) Using --- to indicate no effect on the process column still works the same as before.

nsmith- commented 2 years ago

If we want to recover the old behavior of converting all --- tokens to 0 we could do

f = ["0" if tok == "-" * len(tok) else tok for tok in l.split()]

or similar. But as far as I know, the only place we want this behavior is on nuisance effects

amarini commented 2 years ago

Could it be because the regexp were not compiled?

nsmith- commented 2 years ago

No, I tried pre-compiling them at first and it did not help much

hcombbot commented 2 years ago

Pull Request Test. Summary

Running options:

MODE : cmssw
COMBINE_TAG : 112x
COMBINE_REPO : cms-analysis
COMBINE_MERGE : cms-analysis/t2wregex

GITHUB_PR : 787 Ratio to reference values:

comb_2019_hbb_boosted_standalone	comb_2019_hgg	comb_2019_hmm	comb_2019_htt	comb_2019_hww	comb_2019_tth_hbb	comb_2019_tth_hgg	comb_2019_tth_multilepton	comb_2019_vh_htt	comb_2019_vhbb	comb_2019_vhbb2017
1.00	0.96	0.99	1.30	1.00	1.00	0.67	1.00	1.01	1.00	1.00

You can find more detail at https://gitlab.cern.ch/cms-hcg/performances/ci/-/pipelines/4401589

hcombbot commented 2 years ago

Pull Request Test. Summary

Running options:

MODE : cmssw
COMBINE_TAG : 112x
COMBINE_REPO : cms-analysis
COMBINE_MERGE : cms-analysis/t2wregex

GITHUB_PR : 787 Ratio to reference values:

comb_2019_hbb_boosted_standalone	comb_2019_hgg	comb_2019_hmm	comb_2019_htt	comb_2019_hww	comb_2019_tth_hbb	comb_2019_tth_hgg	comb_2019_tth_multilepton	comb_2019_vh_htt	comb_2019_vhbb	comb_2019_vhbb2017
1.00	0.96	0.99	1.30	1.00	1.00	0.67	1.00	1.01	1.00	1.00

You can find more detail at https://gitlab.cern.ch/cms-hcg/performances/ci/-/pipelines/4401595

hcombbot commented 2 years ago

Pull Request Test. Summary

Running options:

MODE : cmssw
COMBINE_TAG : 112x
COMBINE_REPO : cms-analysis
COMBINE_MERGE : cms-analysis/t2wregex

GITHUB_PR : 787 Ratio to reference values:

comb_2019_hbb_boosted_standalone	comb_2019_hgg	comb_2019_hmm	comb_2019_htt	comb_2019_hww	comb_2019_tth_hbb	comb_2019_tth_hgg	comb_2019_tth_multilepton	comb_2019_vh_htt	comb_2019_vhbb	comb_2019_vhbb2017
1.00	0.96	0.99	1.00	1.00	1.00	0.67	1.00	1.01	1.00	1.00

You can find more detail at https://gitlab.cern.ch/cms-hcg/performances/ci/-/pipelines/4435886

hcombbot commented 2 years ago

Pull Request Test. Summary

Running options:

MODE : cmssw
COMBINE_TAG : 112x
COMBINE_REPO : cms-analysis
COMBINE_MERGE : cms-analysis/t2wregex

GITHUB_PR : 787 Ratio to reference values:

comb_2019_hbb_boosted_standalone	comb_2019_hgg	comb_2019_hmm	comb_2019_htt	comb_2019_hww	comb_2019_tth_hbb	comb_2019_tth_hgg	comb_2019_tth_multilepton	comb_2019_vh_htt	comb_2019_vhbb	comb_2019_vhbb2017
1.00	0.96	0.99	1.00	1.00	1.00	0.67	1.00	1.01	1.00	1.00

You can find more detail at https://gitlab.cern.ch/cms-hcg/performances/ci/-/pipelines/4435889