cms-analysis / HiggsAnalysis-CombinedLimit

CMS Higgs Combination toolkit.
https://cms-analysis.github.io/HiggsAnalysis-CombinedLimit/latest
Apache License 2.0
75 stars 384 forks source link

Speed up datacard parsing #787

Closed nsmith- closed 2 years ago

nsmith- commented 2 years ago

Two regular expressions were applied to all lines in card, both of which were 10x slower than doing the equivalent transformation by hand in python.

The latter was applied universally, whereas now it is only checked for parameter effect size arguments. This means that the interpretation of arguments for certain nuisances is now more restricted, e.g.:

unc1 unif - 1  ...
unc2 gmN -  ...

used to be equivalent to

unc1 unif 0 1  ...
unc2 gmN 0  ...

but now would raise an error on converting - to int or float. (of course for gmN you wouldn't really want 0 anyway) Using --- to indicate no effect on the process column still works the same as before.

nsmith- commented 2 years ago

If we want to recover the old behavior of converting all --- tokens to 0 we could do

f = ["0" if tok == "-" * len(tok) else tok for tok in l.split()]

or similar. But as far as I know, the only place we want this behavior is on nuisance effects

amarini commented 2 years ago

Could it be because the regexp were not compiled?

nsmith- commented 2 years ago

No, I tried pre-compiling them at first and it did not help much

hcombbot commented 2 years ago

Pull Request Test. Summary

Running options:

hcombbot commented 2 years ago

Pull Request Test. Summary

Running options:

hcombbot commented 2 years ago

Pull Request Test. Summary

Running options:

hcombbot commented 2 years ago

Pull Request Test. Summary

Running options: