Clarify initial states and PDFs used

cschwan commented 2 years ago

PineAPPL's metadata contains the keys initial_state_1 and initial_state_2, which are the PDG Monte Carlo IDs of the PDFs that the grid must be convolved with.

However, I now realise that the name is confusing, because the PDFs not necessarily agree with the actual hadronic initial states. For instance, when we use processes which collide lead nuclei with protons, but we want to fit only the proton, yadism for instance will convert the lead nuclei using isospin-symmetry to an 'effective proton in lead' so that initial_state_1 = 2212 and initial_state_2 = 2212.

I therefore suggest that we replace initial_state_1 and initial_state_2 with pdf1 and pdf2, and add the following additional keys:

in1: the actual initial state 1 of the collision as a PDF MC ID
in2: same for the initial state 2

If in1 and pdf1 and/or in2 and pdf2 are different, this means that the situation above is true and that inside the grid we make assumptions about the 'nuclear model'; for this we should document the atomic number (A) and number of protons (Z), for instance in the following way:

nuclear_model_1: A=1,Z=1

for deuterons.

Furthermore, if we have leptons or other non-hadronic particles in the initial state, the corresponding pdf1/pdf2 should be present but emtpy.

cschwan commented 2 years ago

CC @felixhekhorn @AleCandido @scarlehoff @Radonirinaunimi

alecandido commented 2 years ago

That's a perfect solution for me.

The only additional proposal is that, since PineAPPL grids metadata are always strings, I would make the nuclear_model_x valid JSON, for parsing simplicity. E.g.: for deuteron

nuclear_model_1: {"A": 1, "Z": 1}

Radonirinaunimi commented 2 years ago

Thanks a lot @cschwan for this. This proposal also is perfect for me (this would make my life infinitely easier).

The only additional proposal is that, since PineAPPL grids metadata are always strings, I would make the nuclear_model_x valid JSON, for parsing simplicity. E.g.: for deuteron
nuclear_model_1: {"A": 1, "Z": 1}

I also like very much this way of representing the metadata which would also make validphys very happy.

cschwan commented 2 years ago

@AleCandido @Radonirinaunimi : yes, let's do that!

scarlehoff commented 2 years ago

For the time being (to get the ball rolling) I will write a script to basically burn the metadata a posteriori. Basically instead of doing it like in this PR https://github.com/NNPDF/nnpdf/pull/1632 (where the information is put in the fit runcard) the relevant theory will be modified to contain the metadata as discussed in this issue.

That way when this is implemented in pineappl (issue #118?) the number of changes in vp will be minimal (maybe the way in which the information is retrieved is changed, but nothing beyond that).

cschwan commented 1 year ago

Another problem that we should keep in mind that

real protons and
'protons as the average nucleus' in nuclei

both unfortunately have the same PDG number, and that leads to potential problems in Grid::optimize; it assumes that all protons are equal, and that isn't the case here, clearly. @Radonirinaunimi this might be a problem that you've already stumbled over.

Radonirinaunimi commented 1 year ago

Another problem that we should keep in mind that
* real protons and

* 'protons as the average nucleus' in nuclei
both unfortunately have the same PDG number, and that leads to potential problems in Grid::optimize; it assumes that all protons are equal, and that isn't the case here, clearly. @Radonirinaunimi this might be a problem that you've already stumbled over.

@cschwan Is this a problem at the level of the storing of the partonic bits or at the level of the convolution? I am not sure if the following applies to the above but usually the way I've dealt with the two different scenarios so far is to always generate grids for the real/free protons and account for the isospin asymmetry later.

cschwan commented 1 year ago

Let's say you have a proton-lead collision, and generate your grid using initial_state_1 and initial_state_2 set to 2212. In that case you should generate a grid where, for instance, u u~ is treated differently from u~ u because both quarks come from different hadrons. This becomes a problem when you optimize the grid because PineAPPL sees that the initial-state PDG IDs are both 2212, and therefore symmetrizes by merging u~ u into u u~. However, this is wrong, because the two 'protons' aren't the same. This could for instance mean that for DY all quarks come from the first hadron, and all anti-quarks from the second hadron. If the two hadrons aren't actually the same you'll get wrong numbers.

Practically you can check this by doing your analyses with your default grid and one where you make sure that it's not optimized.

Radonirinaunimi commented 1 year ago

That's right! But there is actually a way around this which was the procedure that has been adopted by nNNPDF in the previous releases. That is, the grids are always generated using $ep$ or $pp$ and to get to $eA$ or $pA$ one convolutes the grids with:

$$ f^A(x) = Z f^{p/A}(x) + (A - Z) f^{n/A}(x)$$

with $f^{p/A}(x)$ and $f^{n/A}(x)$ denoting the proton- and neutron-bound PDFs respectively. Doing so ofc assumes that all the nuclear datasets are corrected for isoscalarity even if $A \neq 2Z$.

cschwan commented 1 year ago

@Radonirinaunimi if it's a problem, it would only be one for $ p A $ generated using $ p p $. In that case you should probably shouldn't optimize.

alecandido commented 1 year ago

@Radonirinaunimi if it's a problem, it would only be one for $ p A $ generated using $ p p $. In that case you should probably shouldn't optimize.

However, this would a problem with the Pineline, since at some point optimize() is called (in Pineko I believe, while in Pinefarm it should be up to the selected external implementation).

cschwan commented 1 year ago

I agree. I think we should start investigating the size of the problem.

felixhekhorn commented 8 months ago

just to echo the discussion from #265 and to summarize the situation: we need to replace initial_state_1 with a more sophisticated structure, which states:

whether the hadron is space-like or time-like, i.e. in the initial state or final state
whether the hadron is linearly polarized
if and how nuclear corrections are taken into account
and of course, we still need the PID

cschwan commented 5 months ago

This has been mostly implemented in https://github.com/NNPDF/pineappl/pull/287. For general nuclei we could add a new type in Convolution that specifies A and N.

NNPDF / pineappl

Clarify initial states and PDFs used #135