cms-analysis / HiggsAnalysis-CombinedLimit

CMS Higgs Combination toolkit.
https://cms-analysis.github.io/HiggsAnalysis-CombinedLimit/latest
Apache License 2.0
75 stars 380 forks source link

Update bin-wise-stats.md #929

Open pfackeldey opened 5 months ago

pfackeldey commented 5 months ago

Dear combine experts,

this PR updates the description of the autoMCstats algorithm. Two descriptions are (likely) more correctly described now regarding the case where the $n_{tot}^{eff}$ is below the threshold (Poisson constrained case). Can you confirm that the description algorithm is correct now?

Best, Peter

pfackeldey commented 5 months ago

Dear combine experts,

as far as I understand the following code block defines the algorithm of the autoMCstats algorithm for the Poisson case: https://github.com/cms-analysis/HiggsAnalysis-CombinedLimit/blob/main/src/CMSHistErrorPropagator.cc#L363-L421

From my understanding this does not align with the description in the documentation: https://cms-analysis.github.io/HiggsAnalysis-CombinedLimit/latest/part2/bin-wise-stats/#description-of-the-algorithm

Can you clarify how the algorithm is implemented for the case where $n_{tot}^{eff} < \mathrm{threshold}$ (Poisson case)?

Best, Peter

ajgilbert commented 5 months ago

Hi Peter, I think the description aligns with the code. Below the Poisson threshold for the sum of processes we do Poisson when the individual process is below the same threshold (this part), otherwise Gaussian (this part). There is also one subtle case not described in the docs, when the per-process error is larger than the bin contents, we cannot form a Poisson uncertainty even if we wanted to, so we put a Gaussian instead (this part).

pfackeldey commented 5 months ago

Hi @ajgilbert,

thank you very much for your fast reply. I think I am still confused by the outer if condition: https://github.com/cms-analysis/HiggsAnalysis-CombinedLimit/blob/main/src/CMSHistErrorPropagator.cc#L350. Isn't this the condition that decides if we are in the Poisson or Gauss case?

Oooh I think I got it... in case $n{tot}^{eff} < \mathrm{threshold}$ we are in the Poisson case. But in the Poisson case one additionally checks if the bin count of each individual process ($i$) is also below this threshold: $n{i}^{eff} = n{i}^2 / e{i}^2 < \mathrm{threshold}$. If yes: apply Poisson, if not: apply Gaussian.

Is my understanding now correct?

Best, Peter

ajgilbert commented 5 months ago

Yes, exactly that :-) The reason is that Gaussian pdfs are faster to evaluate than Poissons, so we prefer to use them when we can.

pfackeldey commented 5 months ago

Thank you so much for your explanation @ajgilbert !

I still think that one sentence needs a revision in the documentation (last point 7):

- The Poisson-constrained parameters are expressed as a yield multiplier with nominal value one: $n_{tot}\cdot v$.
+ The Poisson-constrained parameters are expressed as a yield multiplier with nominal value one: $n_{i} \cdot v$.

Since the Poisson parameters should act on each process individually, don't they?

If you don't mind I would go ahead and update this PR with:

Is this alright with you?