N+1 for Poisson expectation

adavidzh commented 9 years ago

We have recently had a pre-fit Asimov pull due to the fact that the expected Poisson rate is set to N+1 in https://github.com/cms-analysis/HiggsAnalysis-CombinedLimit/blob/development/python/ModelTools.py#L169

It's unclear (to me) why exactly this is the case, but it's like that since 2011.

adavidzh commented 9 years ago

From @gpetruc on 12 Dec 2014:

So, that N+1 vs N is that in practice for gmN combine makes a pre-fit Asimov with the mean expected background instead of the mode (which is what's normally done for Asimov)

Now, if N is small, which is the case for which the difference matters, one is probably outside the asymptotic regime, so the Asimov isn't necessarily the good answer, especially given that outside the asymptotic regime median, mean and mode of the expected distribution of any quantity like limit or significance can differ substantially.

That said, i agree that it is somewhat inconsistent, so maybe we should change it.

So, for a Poisson with mean lambda one has:

the mode is <= lambda.
the median is ~ lambda+1/3 - 0.02/lambda, which clearly breaks for lambda=0 (as does the notion of Poisson mean).

Is the args[0]+1 in https://github.com/cms-analysis/HiggsAnalysis-CombinedLimit/blob/development/python/ModelTools.py#L169 an attempt at the median?

Anyway, from what I have understood of Poisson medians, I would propose - if that is the relevant quantity for Asimovs:

mean=0 -> median = 0
mean>0 -> median = Floor[mean + 1/3 - 0.02/mean]

gpetruc commented 9 years ago

Hi André,

When considering the nuisance parameter, you should be looking at a Gamma distribution, not a Poisson. The Poisson is for the event yield.

Giovanni Il 24-lug-2015 13:10, "André David" notifications@github.com ha scritto:

From @gpetruc https://github.com/gpetruc on 12 Dec 2014:

So, that N+1 vs N is that in practice for gmN combine makes a pre-fit Asimov with the mean expected background instead of the mode (which is what's normally done for Asimov)

Now, if N is small, which is the case for which the difference matters, one is probably outside the asymptotic regime, so the Asimov isn't necessarily the good answer, especially given that outside the asymptotic regime median, mean and mode of the expected distribution of any quantity like limit or significance can differ substantially.

That said, i agree that it is somewhat inconsistent, so maybe we should change it.

So, for a Poisson with mean lambda one has:

the mode is <= lambda.

the median is ~ lambda+1/3 - 0.02/lambda, which clearly breaks for lambda=0 (as does the notion of Poisson mean).

Is the args[0]+1 in https://github.com/cms-analysis/HiggsAnalysis-CombinedLimit/blob/development/python/ModelTools.py#L169 an attempt at the median?

Anyway, from what I have understood of Poisson medians, I would propose - if that is the relevant quantity for Asimovs:

mean=0 -> median = 0

mean>0 -> median = Floor[mean + 1/3 - 0.02/mean]

— Reply to this email directly or view it on GitHub https://github.com/cms-analysis/HiggsAnalysis-CombinedLimit/issues/161#issuecomment-124477155 .

nucleosynthesis commented 6 years ago

Closing the issue since this is now a known feature

nsmith- commented 1 year ago

I want to revisit this topic, as it seems there is still an inconsistency motivated by the fact that a gmN nuisance should really be consistent with a rateParam plus control region setup. Take for example the cards:

card_CR.txt:

imax 2 number of bins
jmax 2 number of processes minus 1
kmax 1 number of nuisance parameters
---------------------------------------
shapes * * FAKE
---------------------------------------
bin          SR    CR  
observation  12    10
---------------------------------------
bin          SR    SR    SR     CR
process     sig  bkg1  bkg2   bkg2
process       0     1     2      1
rate          5    10     5     10
---------------------------------------
lumi lnN   1.02  1.02  1.02      -
---------------------------------------
bkg2_norm rateParam * bkg2 1.0 [0,2]

and card_gmN.txt:

imax 1 number of bins
jmax 2 number of processes minus 1
kmax 2 number of nuisance parameters
---------------------------------------
shapes * * FAKE
---------------------------------------
bin          SR
observation  12
---------------------------------------
bin                 SR    SR    SR
process            sig  bkg1  bkg2
process              0     1     2
rate                 5    10     5
---------------------------------------
lumi lnN          1.02  1.02  1.02
bkg2_norm gmN 10     -     -   0.5

These two cards construct identical models (apart from the scaling of bkg2_norm by 10 in gmN case). This was confirmed by evaluating the DeltaNLL for each model with a random grid of points in the 3d parameter space. The max difference was less than 1e-14.

However, the gmN model has the non-closure behavior for asimov expected:

text2workspace.py card_gmN.txt
combine -M FitDiagnostics -t -1 --expectSignal 0 \
  --rMin -2 --rMax 2 card_gmN.root --saveWorkspace -n .gmN --cminDefaultMinimizerTolerance 1e-5

returning Best fit r: 0.0999897 -0.801731/+0.90802 (68% CL), with the full fit result:

  RooFitResult: minimized FCN value: -4.39075e-09, estimated distance to minimum: 3.56105e-10
                covariance matrix quality: Full, accurate covariance matrix
                Status : MINIMIZE=0 HESSE=0 

    Floating Parameter    FinalValue +/-  Error   
  --------------------  --------------------------
             bkg2_norm    1.0000e+01 +/-  3.15e+00
                  lumi    1.2822e-05 +/-  1.00e+00
                     r    9.9990e-02 +/-  8.25e-01

For the CR card, we have

text2workspace.py card_CR.txt --X-allow-no-signal
combine -M FitDiagnostics -t -1 --expectSignal 0 \
  --rMin -2 --rMax 2 card_CR.root --saveWorkspace -n .CR --cminDefaultMinimizerTolerance 1e-5

returning Best fit r: 8.76099e-06 -0.790327/+0.895702 (68% CL) and the full fit result:

  RooFitResult: minimized FCN value: -3.21762e-09, estimated distance to minimum: 6.86606e-11
                covariance matrix quality: Full, accurate covariance matrix
                Status : MINIMIZE=0 HESSE=0 

    Floating Parameter    FinalValue +/-  Error   
  --------------------  --------------------------
             bkg2_norm    1.0000e+00 +/-  3.11e-01
                  lumi   -1.9665e-06 +/-  1.00e+00
                     r    8.7610e-06 +/-  8.14e-01

This has a direct effect on the asymptotic pre-fit expected limit: for gmN we have combine -M AsymptoticLimits --run blind card_gmN.root returning

 -- AsymptoticLimits ( CLs ) --
Expected  2.5%: r < 1.0703
Expected 16.0%: r < 1.4189
Expected 50.0%: r < 2.0000
Expected 84.0%: r < 2.8769
Expected 97.5%: r < 4.0161

and combine -M AsymptoticLimits --run blind card_CR.root returning

 -- AsymptoticLimits ( CLs ) --
Expected  2.5%: r < 0.9521
Expected 16.0%: r < 1.3054
Expected 50.0%: r < 1.8750
Expected 84.0%: r < 2.7420
Expected 97.5%: r < 3.8750

The difference is coming from the fact that $q_{\mu,A}$ is biased low for the gmN card since the Asimov is not corresponding to r=0 but rather r=0.1. A scan:

combine -M GenerateOnly -t -1 card_gmN.root -n .gmNasimov --saveToys
combine -M MultiDimFit card_gmN.root -n .gmNscan \
    -D higgsCombine.gmNasimov.GenerateOnly.mH120.123456.root:toys/toy_asimov --algo grid --rMin 0 --rMax 0.5
combine -M GenerateOnly -t -1 card_CR.root -n .CRasimov --saveToys
combine -M MultiDimFit card_CR.root -n .CRscan \
    -D higgsCombine.CRasimov.GenerateOnly.mH120.123456.root:toys/toy_asimov --algo grid --rMin 0 --rMax 0.5

shows more clearly

We can recover the CR card behavior by pre-setting the nuisance parameter: combine -M AsymptoticLimits --run blind card_gmN.root --setParameters bkg2_norm=10.

Of course, all observed quantities are immune to such problems, and the post-fit expected is also immune since the Asimov is generated with an appropriate value of bkg2_norm, i.e.

combine -M AsymptoticLimits card_CR.root
combine -M AsymptoticLimits card_gmN.root

produce identical results.

Aside from the inconsistency, there may be a question as to which setup is more accurate w.r.t. toys. I tried to evaluate this by running

combine -M HybridNew card_CR.root --LHCmode LHC-limits -t -1 --expectedFromGrid 0.5
combine -M HybridNew card_gmN.root --LHCmode LHC-limits -t -1 --expectedFromGrid 0.5
combine -M HybridNew card_gmN.root --LHCmode LHC-limits -t -1 --setParameters bkg2_norm=10  --expectedFromGrid 0.5

each with various settings of -toysH, --interpAcc, and --rAbsAcc, but I could not find a way to lower the numeric uncertainty enough to show a clear relation between the three setups. In general, all three are close to each other and around 1.8something

kcormi commented 10 months ago

I agree with @nsmith- here that the two setups should be identical, and with that understanding, it does seem that this should be fixed. Is there something I am missing? If we agree on it, its extremely easy to implement.

nucleosynthesis commented 10 months ago

Hi all,

I think we should be a bit careful here. In the land of low statistics, even the default toy based methods may not be correct. Bob has a paper all about this specifically in the on/off problem (where the gamma arises) and if one uses our "cousins-highland" approach for the nuisance parameter, then its default value shouldn't matter (the observed value does as we use that to construct the likelihood). It might make more sense that we ask people to use toys in a more careful way when we are in low-statistics regimes where this make a difference. Basically, I think neither Asimov nor our standard toy based approaches are really correct when the difference between N and N+1 is significant.

Nick

On Thu, Nov 2, 2023 at 3:22 PM kcormi @.***> wrote:

I agree with @nsmith- https://github.com/nsmith- here that the two setups should be identical, and with that understanding, it does seem that this should be fixed. Is there something I am missing? If we agree on it, its extremely easy to implement.

— Reply to this email directly, view it on GitHub https://github.com/cms-analysis/HiggsAnalysis-CombinedLimit/issues/161#issuecomment-1790947573, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAMEVW5LEQNTJTRENAE2CJDYCO3DXAVCNFSM4AX5KLLKU5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCNZZGA4TINZVG4ZQ . You are receiving this because you modified the open/close state.Message ID: @.*** .com>

nsmith- commented 10 months ago

I would characterize this issue as more about the consistency of two ways of developing a model than whether or not the usual statistical methods work well for such a model.

MatthewDittrich commented 2 months ago

Hi all,

I was wondering about something I am seeing with the gmN parameter that may also be related to this N+1 issue.

Consider the following cards (Card1 and Card2 respectively) where the only difference is the addition of the gmN parameter in Card2:

imax 1 number of bins
jmax 2 number of processes minus 1
kmax 1 number of nuisance parameters
---------------------------------------
shapes * * FAKE
---------------------------------------
bin          SR
observation  12
---------------------------------------
bin                 SR    SR    SR
process            sig  bkg1  bkg2
process              0     1     2
rate                 5    10     5
---------------------------------------
lumi lnN          1.02  1.02  1.02

imax 1 number of bins
jmax 2 number of processes minus 1
kmax 2 number of nuisance parameters
---------------------------------------
shapes * * FAKE
---------------------------------------
bin          SR
observation  12
---------------------------------------
bin                 SR    SR    SR
process            sig  bkg1  bkg2
process              0     1     2
rate                 5    10     5
---------------------------------------
lumi lnN          1.02  1.02  1.02
bkg2_norm gmN 10     -     -   0.5

executing combine -M Significance card*.txt --expectSignal=1 -t -1 on these returns two different expected significances, but not in the direction I would expect.

For Card1: 1.22374 For Card2: 1.22432

The expected significance actually increases with the addition of the gmN systematic which I would not expect.

amarini commented 2 months ago

I think we do something for the scaling in case of a gamma parameter.

See the likelihood scan from combine, vs implementing it in python as two separate bins. The one of combine seems to cross, which indicates that the nuisance is more affecting the signal, rather than the background. But honestly, I don't remember how exactly is implemented. Does somebody have already the same dc implemented w/ rateparams?

(command)

combine -M MultiDimFit --algo=grid --rMin=-2 --rMax=4 -t -1 --expectSignal=1 dc2.txt -n dc2

nsmith- commented 2 months ago

Does somebody have already the same dc implemented w/ rateparams?

scroll up :)

amarini commented 2 months ago

:+1: (actually, I should have read the thread in more details, as apparently is on something else :D ).

I think there is indeed a mistake in the prefit assignment that only reflect on the gmN Asimov. (as said above) It is true that we need to think in terms of gamma distribution, as Giovanni said, but it is worth to notice that the gamma is a probability distribution for the parameter and not the likelihood itself.

(notation, the probability that I mess up with it is 1 😉 ) In particular, being $p( \vec{x} | \vec{p})$ a probability distribution on $\vec{x}$ with parameters $\vec{p}$ (now the vector drops), and being $\mathcal{P}( d | \lambda)$ the Poisson distribution and $\Gamma(x | k,\theta)$ the gamma distribution.

what we want as likelihood is (i.e., a CR from which is derived the measurement): $\mathcal{L} = \mathcal{P}(D| r s + \dots + \mu \tau) \mathcal{P}(d| \mu)$ (1) where $D$, $d$ are the observed data in the SR, and CR, $\mu$ is the background normalization and $\tau$ is a transfer factor. (you can put it either on the left or on the right)

now it is known [ Ref. 1 ] that the Bayesian posterior of the of (1) with an uniform prior on the parameter is the Gamma distribution. (Just look at the functional form of the gamma and the Poisson and you see that you need to exchange a parameter with the variable and rearrange the parameter values). In this case we can write our likelihood function as (we need just to match what is what between the different terms): $\mathcal{L} = \mathcal{P}(D | r s \dots + \mu\tau) \Gamma(\mu\tau |d+1, \tau )$ Note again that in the $\Gamma$ have the parameter in the probability space, as we applied Bayes theorem.

For what concern us, I think we should make sure that the prefit Asimov is indeed identical and not shifted. In the end, we are choosing the first implementation with the Poisson (though they are equivalent).

You can check that the formulas above yield the same result with some numerical calculation. For example: (adding a small vertical shift between the two for plotting, $-\log\mathcal{L}$, $\mu$)

cms-analysis / HiggsAnalysis-CombinedLimit

N+1 for Poisson expectation #161