cms-analysis / HiggsAnalysis-CombinedLimit

CMS Higgs Combination toolkit.
https://cms-analysis.github.io/HiggsAnalysis-CombinedLimit/latest
Apache License 2.0
75 stars 381 forks source link

Different results with each run #883

Open nucleosynthesis opened 9 months ago

nucleosynthesis commented 9 months ago

As Pointed out by @adavidzh , when running HybridNew with the same (default) seed, the limit is different. The RooFit seed is set but not clear that ROOT gRandom is.

Possibly related to this line

https://github.com/cms-analysis/HiggsAnalysis-CombinedLimit/blob/ec31c309bcd95f4c116ad6f3faf8018908f1fa8d/src/HybridNew.cc#L1682

Does anyone know (maybe @guitargeek ?) If this coukd be the cause and how to resolve it?

guitargeek commented 9 months ago

Hi, from the top of my head I have no idea. How can this be reproduced? Then I can help figuring out what's going on

nucleosynthesis commented 9 months ago

Running the following will show the behaviour

combine template-analysis-datacard.txt -M HybridNew --LHCmode LHC-limits
--rMax 2.0 --clsAcc 0.01

The datacard can be found in

data/tutorials/CAT23001/template-analysis-datacard.txt
nucleosynthesis commented 9 months ago

Some updates. It seems to be related to the "on the fly" workspace creation. If one first creates the binary workspace and uses that as input, then the repeated runs produces the same output eg.

With the text file directly:

combine template-analysis-datacard.txt -M HybridNew --LHCmode LHC-limits --rMax 2.0 --clsAcc 0.01
 -- Hybrid New --
Limit: r < 0.335698 +/- 0.0143468 @ 95% CL
Done in 0.15 min (cpu), 0.15 min (real)

combine template-analysis-datacard.txt -M HybridNew --LHCmode LHC-limits --rMax 2.0 --clsAcc 0.01
 -- Hybrid New --
Limit: r < 0.356193 +/- 0.0349485 @ 95% CL
Done in 0.18 min (cpu), 0.18 min (real)

first building the workspace:

text2workspace.py template-analysis-datacard.txt

combine template-analysis-datacard.root -M HybridNew --LHCmode LHC-limits --rMax 2.0 --clsAcc 0.01
 -- Hybrid New --
Limit: r < 0.346362 +/- 0.0134581 @ 95% CL
Done in 0.24 min (cpu), 0.25 min (real)

combine template-analysis-datacard.root -M HybridNew --LHCmode LHC-limits --rMax 2.0 --clsAcc 0.01 
 -- Hybrid New --
Limit: r < 0.346362 +/- 0.0134581 @ 95% CL
Done in 0.24 min (cpu), 0.24 min (real)

Could the binary file creation have something to do with it?

nucleosynthesis commented 9 months ago

Digging a bit more, it seems like this might be intentional. In these lines https://github.com/cms-analysis/HiggsAnalysis-CombinedLimit/blob/ec31c309bcd95f4c116ad6f3faf8018908f1fa8d/src/Combine.cc#L311-L322 , a temporary file is created to store the workspace if the input is the datacard. This allows one to run multiple commands in parallel without worrying about each one writing over eachother.

However, this c++ method cannot be seeded for reproducibility and since it seems RooFits random number generator is tied to TRandom (which in turn is tied to this one - is that right @guitargeek ?), we can't avoid non-reproducible results if one runs the same command over and over on the .txt file.

@adavidzh , I think we just need to make it clear to a user that if they run the commands that use toys, they will get different but consistent results each time unless they first convert to a binary file and use that as the input.

adavidzh commented 8 months ago

a temporary file is created to store the workspace if the input is the datacard. This allows one to run multiple commands in parallel without worrying about each one writing over eachother.

I would argue that we should be able to create a unique - per job - deterministic identifier. I.e., we do not need a random thing, just a unique one, that can e.g., be made out of a hash of things like the datacard and command line options.

nucleosynthesis commented 8 months ago

To be clear,

As far as I can tell this is done by ROOT rather than something we can control. Perhaps Someone more expert in the inner workings of root can see if it can be bypassed

On Sun, 24 Dec 2023, 16:47 André David, @.***> wrote:

a temporary file is created to store the workspace if the input is the datacard. This allows one to run multiple commands in parallel without worrying about each one writing over eachother.

I would argue that we should be able to create a unique - per job - deterministic identifier. I.e., we do not need a random thing, just a unique one, that can e.g., be made out of a hash of things like the datacard and command line options.

— Reply to this email directly, view it on GitHub https://github.com/cms-analysis/HiggsAnalysis-CombinedLimit/issues/883#issuecomment-1868556397, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAMEVW4MY6EHAEC3JRDSKJ3YLBMC3AVCNFSM6AAAAABAXQYKTSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRYGU2TMMZZG4 . You are receiving this because you authored the thread.Message ID: @.*** com>

nucleosynthesis commented 8 months ago

Hint @guitargeek 😉

adavidzh commented 8 months ago

I don't get it @nucleosynthesis: the code you reference (with mkdtemp and mkstemp) is combine's.

nucleosynthesis commented 8 months ago

Right,

https://man7.org/linux/man-pages/man3/mkstemp.3.html

https://man7.org/linux/man-pages/man3/mkdtemp.3.html

Are what we use to guarantee to uniqueness but it seems that uses the same seed that RooFit will then use.

Using the same datacard and command would imply the same hash so not sure that would work with concurrent identical command lines (which we can't do currently)

Too much of this is my own speculation (based on some minimal testing) though, so perhaps there is simply a way to avoid the clash between the seed that RooFit uses and the one being triggered by the call to mkstemp.

kcormi commented 5 months ago

Just for completeness, I don't think this is limited to toy-based methods like HybridNew. I see the same effect (slightly different numerical values) when running the asymptotic Significance method from the datacards, but if I produce the workspace first and rerun the results are identical. Still, negligibly small differences, but it appears to be the same issue.