Cuts with LHCb data in NNPDF4.0 NLO

enocera commented 2 years ago

There seems to be a problem with the implementation of the kinematic cuts on the LHCb data in the NLO fit. The problems are as follows.

with LHCBWZMU8TEV, LHCBWZMU7TEV, LHCB_Z_13TEV_DIELECTRON, LHCB_Z_13TEV_DIMUON: points at y<2.25/2.20 are excluded from the NLO fit (while they shouldn't);
with LHCBWZMU8TEV: one point at y~2.65 is excluded while it shouldn't. cc @cschwan

cschwan commented 2 years ago

There's also something very strange going on with ATLAS Z pt 8 TeV in the rapidity distribution:

https://vp.nnpdf.science/1EFadUyES8KqGpZWRF-S-Q==/matched_datasets_from_dataspecs12_dataset_report_report.html

enocera commented 2 years ago

Indeed. Needless to say that we should check all the datasets entering the NLO fit.

cschwan commented 2 years ago

Your first point I didn't understand. Why shouldn't the NLO fit discard the low rapidity bins for LHCb if they are discarded at NNLO? I suppose your source of truth is table 4.1 in the paper?

enocera commented 2 years ago

Because the reason to discard the low rapidity bins is an instability in the computation of the NNLO K factor for these bins only. At NLO there's not such an instability, therefore the points should not be discarded. My source of truth is the filters.yaml file https://github.com/NNPDF/nnpdf/blob/eeae19b15c960b4f8f974c3ac2ba2af36ed845bc/validphys2/src/validphys/cuts/filters.yaml#L52 which is the file that controls cuts in fits. Table 4.1 should reflect this file (and I think that it does reflect the file insofar as LHCb data are concerned). To me the problem seems that the cuts defined in filter.yaml are not correctly propagated to the actual fit, limited to the NLO case.

siranipour commented 2 years ago

I believe we use the weird fromintersection cuts at NLO. So this is a plotting issue rather than an issue in the fits themselves. As you can see the filter rules for these datasets only get applied at NNLO https://github.com/NNPDF/nnpdf/blob/eeae19b15c960b4f8f974c3ac2ba2af36ed845bc/validphys2/src/validphys/cuts/filters.yaml#L58

enocera commented 2 years ago

Well, if I look at the tables of chi2 I see a correspondence between the number of data points included in the fit and the number of points (not) displayed in the plots. So here there is consistency. I exclude a plotting issue.

I believe that the issue (if any) is a little more subtle. In the specific case of LHCb, in a NLO fit, we have several rules:

the rule in filter.yaml, so we wouldn't expect a cut in the NLO fit (as @siranipour observes the rule is such that the cut is applied only at NNLO);
the rule fromintersection, which is related to the size of the NNLO correction. Now, I think that what happens is that, because the NNLO correction is not reliable, fromintersection automatically excludes these points in the NLO fit as well, even if the rule in filter.yaml does not. But we have a solution to this: declare (in the fit runcard) which datasets should not be subject to the fromintersection rule.

So all in all, I think that there's no issue at all: the points excluded from the NLO fit are excluded because the NNLO correction is large, and this is a consequence of the fromintersection rule. Let me check this explicitly, but if so, I think that there's no issue at all.

enocera commented 2 years ago

@cschwan I think that the reason why there are missing points in the NLO fit is because, on top of the cuts displyed in Table 4.1, we also have an additional cut in the NLO fit, as explained in the second paragraph of Sect. 4.1: "In the case of QCD higher-order corrections, we compute, for each data point, the ratio between the absolute difference of the NNLO and NLO predictions to the experimental uncertainty. If this quantity is smaller than a given threshold value, the data point is retained in the NLO fit, otherwise it is discarded. We examined two alternative values of the threshold, 1 and 2 respectively. We concluded that a value of 1 is unnecessarily aggressive, as it leads to discarding an excessive number of data points from the NLO fit, while a value of 2 ensures that a reasonable number of data points are retained in the fit with reasonable theoretical accuracy. We therefore use 2 as our default threshold value."

What I'll do is to recheck explicitly that this is the case. If so, all is fine and we can happily close this issue.

enocera commented 2 years ago

@cschwan This is a non-issue. The extra points removed in NLO fits from, e.g., the LHCb and from the ATLAS Z pT data, are indeed removed because of the similarity cut, as explained in the NNPDF4.0 paper. Here's a data-theory comparison plot at NLO w/o the similarity cut https://vp.nnpdf.science/L4ezSFlaQIiF9r4WfMnZBw==.

cschwan commented 2 years ago

There seems to be a problem with the implementation of the kinematic cuts on the LHCb data in the NLO fit. The problems are as follows.
* with LHCBWZMU8TEV, LHCBWZMU7TEV, LHCB_Z_13TEV_DIELECTRON, LHCB_Z_13TEV_DIMUON:
  points at y<2.25/2.20 are excluded from the NLO fit (while they shouldn't);

* with LHCBWZMU8TEV:
  one point at y~2.65 is excluded while it shouldn't.
  cc @cschwan

So concerning the first point does that mean that table 4.1 is wrong?

enocera commented 2 years ago

I'd say that table 4.1 is inaccurate, indeed. The cut is also applied at NLO, though indirectly through the fact that the similarity cut prevails (in that points for which the NNLO correction cannot be checked are excluded a priori also from the NLO fit). It's not completely clear to me how to update the text/table in Sect. 4.1. Let me think about this.

cschwan commented 2 years ago

I already updated the table yesterday, except for the DIS/FTDY experiments and the Z pt measurements from ATLAS/CMS, so let me know how we should proceed. I haven't pushed the change yet, but I can do that if you want it.

enocera commented 2 years ago

Please go ahead and push, then I'll iterate, if needed. I think that the problem with Table 4.1 is that is is not clear at all that, on top of what is listed in the table, there is the NLO/NNLO similarity cut. As a bare minimum, this should be made clear in the table caption.

cschwan commented 2 years ago

Here you go: https://github.com/NNPDF/papers/commit/019f89b934894d6d644fc3df33f64bccf4b7a44e

NNPDF / nnpdf

Cuts with LHCb data in NNPDF4.0 NLO #1480