FTDY instability issues

scarlehoff commented 1 year ago

Let's collect here (or in an associated wiki page) the information about the situation of the FTDY grids in the MHOU fits (see here) I would like to know whether this is something that we can fix in vrap.

I've had a quick look at the following grid from theory 400 DYE866P.pineappl.lz4 (so it does include only NLO)

When doing pineappl convolute with NNPDF4.0 we get quite reasonable results for all datapoints (I'm not copying them here but the central values are all positive and the scale variations move in the 10-20% range). One example of this:

pineappl convolute DYE866P.pineappl.lz4 NNPDF40_nnlo_as_01180

124 12.35 12.35 0.748651 0.748651       1.2137391e-1   -17.59    22.61
125 13.85 13.85 0.679481 0.679481       6.7099958e-2   -18.69    24.38
126 15.85 15.85 0.604361 0.604361       2.8980554e-2   -20.07    26.64

However, if I now use 190310-tg-nlo-global with pineappl convolute, some points are very similar, while others become negative or receive very large scale corrections (the last three columns are cv, and -,+ scales in the 7-points prescription)

pineappl convolute DYE866P.pineappl.lz4 190310-tg-nlo-global

124 12.35 12.35 0.748651 0.748651       2.0278953e-2   -72.45   112.19
125 13.85 13.85 0.679481 0.679481      -4.8924728e-2     4.59    -4.58
126 15.85 15.85 0.604361 0.604361      -6.4598359e-2    17.67   -15.38

Note that this is with the NLO grid. With the NNLO grids the effect are less pronounced (but still there).

This is coming from negative points in the PDF. When I do convolute with 190310-tg-nlo-global with --force positive I find results compatible with NNPDF4.0. So @andreab1997 one possible solution (regardless of other cuts that might be implemented) would be to apply a "positive cutoff" in the computation of the MHOU. On one hand this is reasonable (a negative cross section is unphysical) but it might be a bit challenging to do while keeping the uncertainties perfectly gaussian.

pineappl convolute DYE866P.pineappl.lz4 190310-tg-nlo-global --force-positive

124 12.35 12.35 0.748651 0.748651       1.3515310e-1   -17.79    23.27
125 13.85 13.85 0.679481 0.679481       6.9460581e-2   -19.38    25.07
126 15.85 15.85 0.604361 0.604361       2.6455519e-2   -21.89    28.98

scarlehoff commented 1 year ago

The fact that --force-positive fixes the problem makes me think the origin of this problem is the same as seen by @RoyStegeman in the charm asymmetry studies. It's stronger in Seaquest since it happens for all datapoints (and thus completely destroys the chi2) but it was probably there also for the other FTDY sets.

enocera commented 1 year ago

Thanks @scarlehoff. I want to take the opportunity of this issue to clarify what I was discussing at the phone call, concerning the cut on $\tau$.

In NNPDF4.0, we apply to all FT DY data the cut $\tau\leq 0.08$, see Sect. 4.1 of the NNPDF paper, that is, we retain only the points for which the inequality is satisfied. The definition of $\tau$ is as follows: $$\tau = \frac{M^2}{\sqrt{s}}$$ where $M$ is the invariant mass of the leptonic pair and $\sqrt{s}$ is the centre-of-mass energy. Because cross sections corresponding to the data sets are for $$\frac{d\sigma}{dM^2dy}(M^2,y),$$ I would consider the maximal variation of the fact./ren. scale variations when $\mu_F=\mu_R=M^2$. I would then re-define the cut on $\tau$ by replacing $M^2\to 4M^2$ or $M^2\to M^2/4$. Now, only the second case is more restrictive than the nominal cut, therefore, I would take $$\tau\leq 0.02.$$ I guess that this is something that can be controlled here and in similar instances of the filters.yaml file. Though perhaps we have to be careful, because this "more conservative cut" applies only to the set of fits studied for theory uncertainties. In the same spirit, I would raise the cut on $Q^2$, relevant to DIS data, by a factor of 4, as already done by @andreab1997.

For the case of DYE906, I guess that another modification is needed. Each of the six data points of this data set are the combination of 10 sub-points, all of which have different values of $\tau$. What I would do is to set to zero the sub-bins that do not fulfil the cut on $\tau$. This should be done consistently both in the numerator and in the denominator. I expect that the central value would not change that much, but numerical (and perturbative) stability will possibly increase.

cschwan commented 1 year ago

The fact that --force-positive fixes the problem makes me think the origin of this problem is the same as seen by @RoyStegeman in the charm asymmetry studies. It's stronger in Seaquest since it happens for all datapoints (and thus completely destroys the chi2) but it was probably there also for the other FTDY sets.

I agree with this. If --force-positive makes any difference at all that is due to negative PDFs, and that should only be a problem at large $x$. There we may have problems with the PDF, but I'd also expect interpolation errors to make a difference between the grids and the FK tables. Do we know the size of this?

andreab1997 commented 1 year ago

Thanks @scarlehoff. I want to take the opportunity of this issue to clarify what I was discussing at the phone call, concerning the cut on τ.

In NNPDF4.0, we apply to all FT DY data the cut τ≤0.08, see Sect. 4.1 of the NNPDF paper, that is, we retain only the points for which the inequality is satisfied. The definition of τ is as follows: τ=M2s where M is the invariant mass of the leptonic pair and s is the centre-of-mass energy. Because cross sections corresponding to the data sets are for dσdM2dy(M2,y), I would consider the maximal variation of the fact./ren. scale variations when μF=μR=M2. I would then re-define the cut on τ by replacing M2→4M2 or M2→M2/4. Now, only the second case is more restrictive than the nominal cut, therefore, I would take τ≤0.02. I guess that this is something that can be controlled here and in similar instances of the filters.yaml file. Though perhaps we have to be careful, because this "more conservative cut" applies only to the set of fits studied for theory uncertainties. In the same spirit, I would raise the cut on Q2, relevant to DIS data, by a factor of 4, as already done by @andreab1997.

For the case of DYE906, I guess that another modification is needed. Each of the six data points of this data set are the combination of 10 sub-points, all of which have different values of τ. What I would do is to set to zero the sub-bins that do not fulfil the cut on τ. This should be done consistently both in the numerator and in the denominator. I expect that the central value would not change that much, but numerical (and perturbative) stability will possibly increase.

I perfectly agree. Just a silly question: Is the definition of tau = (M^{2} / s) rather than (M^{2}/ sqrt{s})?

enocera commented 1 year ago

Is the definition of tau = (M^{2} / s) rather than (M^{2}/ sqrt{s})?

@andreab1997 You're right. The definition is $\tau=M^2/s$., see Eq. (5) in https://arxiv.org/pdf/1002.4407.pdf.

NNPDF / theories

FTDY instability issues #6