Check nominal LHE scale weight

ktht commented 3 years ago

Previously I verified that the 4th LHE weight in all events and samples was effectively equal to 1. However, in bbww analysis it was discovered that in some events this is not the case. I'll revisit the issue again. If it turns out that we also have the same problem, then I'll probably compute the shifts in LHE weight as ratio of the shift to the 4th weight (ie as LHEScaleWeight[i] / LHEScaleWeight[4] with i running from 0 to 8) and rerun post-production of the affected samples in order to recompute the event sums.

ktht commented 3 years ago

In order to quantify the number of samples where the nominal LHE scale weight (= 4th weight in the LHEScaleWeight array) is significantly different from 1 and how large of an impact it can have on per sample basis, I dumped the nominal LHE scale weight from all of the samples if the weight differed more than 1% from unity (ie is outside of [0.99, 1.01]). In about 8% of the samples, or in 121 samples, there is at least one such nominal LHE weight that ended up outside of this range. In order to assess how large the impact it has on the normalization of samples, I calculated the relative difference of the nominal LHE scale weight from unity wrt the total event statistics of the sample:

sum

According to this, the impact on the normalization is in the order of 1e-5, so effectively negligible, in all samples except for the single anti-/top samples where the impact varies from 0.13% to 0.19%. After dumping minimum and maximum values of nominal LHE scale weights from every sample, it's clear to me that the extreme outliers happen only in single anti-/top samples. What's even more bizarre is that:

the largest nominal LHE scale weights can be as large as >100;
some nominal LHE scale weights can have a negative sign (example: 2018 ST_t-channel_antitop_4f_inclusiveDecays sample);
```
***********************************************************************
```
Row Instance run luminosit event LHEScaleW
270522 0 1 786437 78643623 -56.77734
270522 1 1 786437 78643623 -60.62109
270522 2 1 786437 78643623 -66.78906
270522 3 1 786437 78643623 -80.0625
270522 4 1 786437 78643623 -83.80078
270522 5 1 786437 78643623 -88.57031
270522 6 1 786437 78643623 -90.48437
270522 7 1 786437 78643623 -94.05468
270522 8 1 786437 78643623 -97.89843
the LHE scale variation can flip a sign. For instance, if the nominal LHE scale weight is negative, then some other LHE scale weight in the same event can be positive (example: 2018 ST_t-channel_antitop_4f_inclusiveDecays sample);
```
***********************************************************************
```
Row Instance run luminosit event LHEScaleW
433638 0 1 229031 22903043 5.5539550
433638 1 1 229031 22903043 2.4246826
433638 2 1 229031 22903043 -1.130676
433638 3 1 229031 22903043 -3.847778
433638 4 1 229031 22903043 -6.078857
433638 5 1 229031 22903043 -8.490234
433638 6 1 229031 22903043 -8.850585
433638 7 1 229031 22903043 -10.51953
433638 8 1 229031 22903043 -12.22900
the nominal LHE scale weight may be 0 (example: 2017 ST_t-channel_antitop_4f_inclusiveDecays sample).
```
***********************************************************************
```
Row Instance run luminosit event LHEScaleW
23415 0 1 1403 820187 0
23415 1 1 1403 820187 0
23415 2 1 1403 820187 0
23415 3 1 1403 820187 0
23415 4 1 1403 820187 0
23415 5 1 1403 820187 0
23415 6 1 1403 820187 0
23415 7 1 1403 820187 0
23415 8 1 1403 820187 0.0650329

From what I can see, these effects are present in all single anti-/top samples, regardless of the era, which is why I propose the following plan:

disable LHE scale variations for single anti-/top samples because significant number of events have untrustworthy LHE scale weights, and inform GEN about these observations;
still implement the relative calculation of LHE scale shift as described in the initial post;
do not recompute dedicated event sums for each LHE scale variation because the effect of the last point on the effective event statistics can be assumed negligible in all samples (except for the single anti-/top samples).

edit: ok, looks like ST_t-channel_top_4f_inclusiveDecays 2018 sample is the only single anti-/top sample that is clear from the above issues.

2nd edit: only the t-channel single anti-/top samples are problematic.

ktht commented 3 years ago

I'm done with the coding part of this task. In addition to the above, I also updated the logic of calculating the event sums for individual LHE scale variations. As mentioned before, it has negligible effect on the normalization of the samples, but it's useful to have it implemented now in order to remain consistent in future.

Regarding the extreme outliers in the single anti-/top samples, some of the same samples have quite broad distribution of gen weights as well, so maybe there's a connection (see plots referenced in this comment).

Also, one event from 2018 DYToLL_2J sample apparently has a negative nominal LHE scale weight:

***********************************************************************
*    Row   * Instance *       run * luminosit *     event * LHEScaleW *
***********************************************************************
*    63183 *        0 *         1 *     25153 *  75456888 * -1.630554 *
*    63183 *        1 *         1 *     25153 *  75456888 * -1.587341 *
*    63183 *        2 *         1 *     25153 *  75456888 * -1.607727 *
*    63183 *        3 *         1 *     25153 *  75456888 * -0.990112 *
*    63183 *        4 *         1 *     25153 *  75456888 * -0.941528 *
*    63183 *        5 *         1 *     25153 *  75456888 * -0.943389 *
*    63183 *        6 *         1 *     25153 *  75456888 * -0.634063 *
*    63183 *        7 *         1 *     25153 *  75456888 * -0.588104 *
*    63183 *        8 *         1 *     25153 *  75456888 * -0.582275 *
***********************************************************************

Although the sign factors out in the end, the minimum/maximum values are taken with the sign, so there's a slight inconsistency. It can be handled in LHEInfoReader but I don't think it's going to make much of a difference.

ktht commented 3 years ago

HN thread: https://hypernews.cern.ch/HyperNews/CMS/get/generators/5165.html

HEP-KBFI / tth-htt

Check nominal LHE scale weight #174