Differences between SModelS and MadStats upper limit on mu with pyhf backend

APMDSLHC commented 1 year ago

System Settings

Fedora Linux 35 Python 3.9.12

Describe the bug

SModelS ulcomputer.getUpperLimitOnMu() and MadStats stat_model.computeUpperLimitOnMu() functions do not give exactly the same results. So far the maximum difference I found between the two outputs was around 3%, but it is possible that for other models it becomes larger. (The CLs seem to always be the same on the other hand.)

To Reproduce

import json, jsonpatch, pyhf, madstats, copy
from smodels.tools.srCombinations import _getPyhfComputer
from smodels.experiment.databaseObj import Database
from smodels.experiment.datasetObj import CombinedDataSet

### SModelS

db = Database('./smodels-database')
expResult = db.getExpResults(analysisIDs=['ATLAS-SUSY-2019-08'],dataTypes=['efficiencyMap'])[0]
combinedDataset = CombinedDataSet(expResult)
nsig = [0.64171013584248, 0.20151994828896, 0.36729875457036, 2.12157276118776, 2.4906940927291803, 2.2180703508504, 1.0707552361033201, 2.1970845124999197, 3.5842286295059407]
ulcomputer = _getPyhfComputer(combinedDataset, nsig)

patch = [{'op': 'add', 'path': '/channels/5/samples/0', 'value': {'data': [0.04308822678442124, 0.013531245259294129, 0.02466261813643536], 'modifiers': [{'data': None, 'type': 'normfactor', 'name': 'mu_SIG'}, {'data': None, 'type': 'lumi', 'name': 'lumi'}], 'name': 'bsm'}}, {'op': 'add', 'path': '/channels/6/samples/0', 'value': {'data': [0.14245498577592744, 0.16723998254638267, 0.14893440661610954], 'modifiers': [{'data': None, 'type': 'normfactor', 'name': 'mu_SIG'}, {'data': None, 'type': 'lumi', 'name': 'lumi'}], 'name': 'bsm'}}, {'op': 'add', 'path': '/channels/7/samples/0', 'value': {'data': [0.0718968610075867, 0.1475252928876509, 0.24066638098619197], 'modifiers': [{'data': None, 'type': 'normfactor', 'name': 'mu_SIG'}, {'data': None, 'type': 'lumi', 'name': 'lumi'}], 'name': 'bsm'}}]

with open("smodels/tim_db/13TeV/ATLAS/ATLAS-SUSY-2019-08-eff/BkgOnly.json","r") as f:
    jsonFile = copy.deepcopy(json.load(f))
f.close()

wsDict = jsonpatch.apply_patch(jsonFile, patch)
workspace = pyhf.Workspace(wsDict)

msettings = {
    "normsys": {"interpcode": "code4"},
    "histosys": {"interpcode": "code4p"},
}

model = workspace.model(modifier_settings=msettings)

args = {"test_stat":"qtilde"}

muUL_pyhfOnly = ulcomputer.getUpperLimitOnMu()

print("Pyhf only, muUL:",muUL_pyhfOnly)

### MadStats

signal = patch
background = jsonFile

xsec = 7.19E-03 #fb

stat_model = madstats.get_multi_region_statistical_model(
    analysis="atlas_susy_2019_08",
    signal=signal,
    background=background,
    xsection=xsec*0.001
)

MS_muUL = stat_model.computeUpperLimitOnMu()
print(f"Upper limit on POI : {MS_muUL}")

print("Difference on muUL:",(muUL_pyhfOnly-MS_muUL)*100/muUL_pyhfOnly,"%")

Expected behaviour

Same results from the two outputs.

Additional information

The patch and the signal yields were computed with SModelS (using the TChiWH_350_180_350_180.slha file) and I cannot attach it due to its file type. Same for the background only statistical model of the analysis (ATLAS-SUSY-2019-08).

jackaraz commented 1 year ago

Hi @APMDSLHC, could you look into the source of the difference? like $\hat{\mu},\ \min(-\log\mathcal{L})$ etc. If these are the same it might be the brenq.

jackaraz commented 1 year ago

Hi @APMDSLHC, please note that with the last updates, you will get different values for $\mu_{UL}$ between SModelS and MadStats. The legacy code used the wrong $CL_s$ values for expected statistical testing. Now we have three distinct expectation types, unified throughout all the backends, and each has well-defined differences.

SpeysideHEP / spey