cms-analysis / HiggsAnalysis-CombinedLimit

CMS Higgs Combination toolkit.
https://cms-analysis.github.io/HiggsAnalysis-CombinedLimit/latest
Apache License 2.0
75 stars 383 forks source link

Moved to ordered collections #936

Closed nucleosynthesis closed 6 months ago

nucleosynthesis commented 6 months ago

Main fix needed is in ModelTools

setNuisPdf = set(setNuisPdf) -> setNuisPdf = list(dict.fromkeys((setNuisPdf)))

nsmith- commented 6 months ago

FYI this will noticeably slow down text2workspace for large models, as the errline dictionary is very large. This was also seen in #791 as we tried to reduce the memory footprint.

nsmith- commented 6 months ago

Also relevant is https://github.com/cms-analysis/HiggsAnalysis-CombinedLimit/pull/759#pullrequestreview-975858053

nucleosynthesis commented 6 months ago

Thanks Nick - I suspect for many of these, we can revert to the standard dict as most cases will have a benign effect - the main one seems to be the ordering of the constraint terms - If someone wants to try removing the other instances of using OrderedDict, that would be helpful to find out which ones are really needed to maintain reproducibility

On Thu, Apr 11, 2024 at 2:20 PM Nicholas Smith @.***> wrote:

Also relevant is #759 (review) https://github.com/cms-analysis/HiggsAnalysis-CombinedLimit/pull/759#pullrequestreview-975858053

— Reply to this email directly, view it on GitHub https://github.com/cms-analysis/HiggsAnalysis-CombinedLimit/pull/936#issuecomment-2049681264, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAMEVW7QJX2GQORKN726TL3Y42E3PAVCNFSM6AAAAABGCACQGGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANBZGY4DCMRWGQ . You are receiving this because you authored the thread.Message ID: @.***>

nsmith- commented 6 months ago

I think that's a good plan, but perhaps more holistic solution is to refactor the parser along the lines of #818

nucleosynthesis commented 6 months ago

Thanks for checking! Yes I suspected the others were somewhat overkill - it might be useful to have a PR open with those other ones reverted (if you happen to have that locally from having done the tests) and then once we are confident, we can push that in - but I also second Nick's suggestion to re-think the way we Parse the inputs more generally

On Fri, Apr 12, 2024 at 2:18 PM kcormi @.***> wrote:

@.**** commented on this pull request.

In python/ModelTools.py https://github.com/cms-analysis/HiggsAnalysis-CombinedLimit/pull/936#discussion_r1562539971 :

@@ -808,7 +810,7 @@ def doFillNuisPdfsAndSets(self): if p != "constr": nuisVars.add(self.out.var(c_param_name)) setNuisPdf.append(c_param_name)

  • setNuisPdf = set(setNuisPdf)
  • setNuisPdf = list(dict.fromkeys((setNuisPdf)))

I did a few tests, and it seems that this line (and line 826 below) changing the sets to an ordered type are enough for exact reproducibility of the fit values on the datacards where we saw the issue when recreating the workspace. Though it may be worth understanding this more clearly.

-- On the broader point I agree with your suggestion, @nsmith- https://github.com/nsmith- it would be nice to make those kinds of changes.

— Reply to this email directly, view it on GitHub https://github.com/cms-analysis/HiggsAnalysis-CombinedLimit/pull/936#pullrequestreview-1996985409, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAMEVW6AWVBN2S2JWV4QSVDY47NI5AVCNFSM6AAAAABGCACQGGVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMYTSOJWHE4DKNBQHE . You are receiving this because you authored the thread.Message ID: <cms-analysis/HiggsAnalysis-CombinedLimit/pull/936/review/1996985409@ github.com>