Polarized in CommonData

felixhekhorn commented 2 years ago

Dear all, since @juanrojochacon is thinking about kickstarting the polarized project in the mid-term future (~September), I was wondering whether we are ready for this in the new CommonData format - it is not a priority, but since I started thinking about this, I thought to open the issue right away ...

@enocera the obvious question first: how was polarized dealt in the past?
while (I guess) in the first iteration of this project we want to do a polarized-only DIS-only fit, we should probably not limit ourselves to these restrictions since we're rethinking everything at the moment ...

Let me list a few questions/points which are not obvious right away to me:

is the property "polarized" a part of the theory runcard? (and hence a property of what @enocera calls "theory", so a folder in the server)
- this would n3fit require eventually to deal with several theories in parallel (for doing polarized and unpolarized in parallel) - is this feasible @scarlehoff ?
- @Radonirinaunimi how are you doing for proton/nuclear? do you have multiple theories?
- @andreab1997 I know you have multiple theories - but not really inside the fit, right? you compute your covmat and then forget about the additional theories?
How is the property "polarized" reflected for a dataset?
- is it part of the name?
- this is currently not reflected by the naming scheme proposed by @cschwan ... we could say default is unpolarized and we add a _POL somewhere in the name if required
- the experiment name is not sufficient, since we expect the EIC to measure both polarized and unpolarized structure functions (now, to be fair, I guess name wise EIC ~ LHC, so the actual experiment would be called different, correct @enocera ? do we know the proposed detector names already? still, the problem might persist )
- is it part of the metadata.yaml? see #1500 (again currently not reflected)
- it has to be a part of pinecardsrunner as we have to define the observable there (e.g. g1)
- we could also add a meta data key there 'polarized': true @cschwan which, I guess, in practice would be sufficient but then the information would be very local and we most likely want to avoid that
- the information has also to be propagated to pineko as it needs to call eko with the correct settings
@cschwan concerning PineAPPL:
- I think, we're mostly ready with the library, correct?
- the (long-term) problem will be the coefficient function provider: for (fully-inclusive) DIS we can do something in yadism, but for double-hadronic (and anything else) it is a completely different story
@scarlehoff the information has to present inside the fit as e.g. sum rules and positivity depend on this property

cc @Zaharid @AleCandido

enocera commented 2 years ago

@enocera the obvious question first: how was polarized dealt in the past?

Polarised PDFs were determined using the pre-historic fortran code. In that code, there were two flags: "UNP" and "POL" - and the second was used to select the relevant data, splitting functions, kernels etc. However, the unpolarised code has evolved so much that, in my opinion, taking inspiration from what we did ~10 years ago does more harm than good.

Let me list a few questions/points which are not obvious right away to me:

is the property "polarized" a part of the theory runcard? (and hence a property of what @enocera calls "theory", so a folder in the server)

I'm not sure. I think that FK tables for polarised PDFs can be treated as FK tables for nuclear PDFs, in the sense that I'd rather have the FK tables for the unpolarised, nuclear and polarised PDFs in the same folder. Say: theory 444 is the pert. charm theory with certain values of the physical parameters for the unp, nuc and pol data sets. The fact that pol observables require different splitting functions and different kernels is something that is identified by the dataset name (within a theory), not by the theory itself.

this would n3fit require eventually to deal with several theories in parallel (for doing polarized and unpolarized in parallel) - is this feasible @scarlehoff ?

I personally don't think that this is needed.

How is the property "polarized" reflected for a dataset?

is it part of the name? I think that this should be part of the dataset name, indeed.

this is currently not reflected by the naming scheme proposed by @cschwan ... we could say default is unpolarized and we add a _POL somewhere in the name if required

the experiment name is not sufficient, since we expect the EIC to measure both polarized and unpolarized structure functions (now, to be fair, I guess name wise EIC ~ LHC, so the actual experiment would be called different, correct @enocera ? do we know the proposed detector names already? still, the problem might persist )

Well, think about what happens with RHIC, which measures proton-proton, proton-ion and ion-ion cross sections (with possibly polarised proton beams): the observable is different. In case of polarised proton collisions, RHIC measures a single or double spin asymmetry. In the case of an unpolarised collision, RHIC measures a cross section. I understand that @cschwan 's naming convention takes into account the name of the observable. And, in my opinion, that should be sufficient.

is it part of the metadata.yaml? see [WIP] Test implementation of new commondata format #1500 (again currently not reflected)

Not sure that we need a specification there - the dataset name should be sufficient.

Anyway, all this is perhaps a little premature - Gargnano will possibly be a good time to discuss the details, also in light of the experience we will get with the new commondata layout and with the proton+nuclear fit in the meantime.

cschwan commented 2 years ago

the (long-term) problem will be the coefficient function provider: for (fully-inclusive) DIS we can do something in yadism, but for double-hadronic (and anything else) it is a completely different story

What exactly changes on the process sidee when you use polarized PDFs?

enocera commented 2 years ago

the (long-term) problem will be the coefficient function provider: for (fully-inclusive) DIS we can do something in yadism, but for double-hadronic (and anything else) it is a completely different story

What exactly changes on the process side when you use polarized PDFs?

You have a different evolution (which, I guess, means that you have to replace unpolarised splitting functions with polarised splitting functions and adapt matching conditions in EKO). And of course you have different coefficient functions. Now, for DIS, as @felixhekhorn says, I guess that one can extend Yadism. For any other process it's not as clear (at least to me), given that public Monte Carlo generators usually do not allow the user to fix the polarisation of the incoming proton.

cschwan commented 2 years ago

we could also add a meta data key there 'polarized': true @cschwan which, I guess, in practice would be sufficient but then the information would be very local and we most likely want to avoid that

From the PineAPPL side we should make sure that the CLI can detect whether the user uses the correct PDF set (polarized PDFs for polarized predictions). But I'm afraid that LHAPDF doesn't have any support for polarized PDFs; for nuclear PDF there's the Particle field which we can read out to determine the nucleus, but for polarized PDFs I'm afraid this isn't documented anywhere.

cschwan commented 2 years ago

the (long-term) problem will be the coefficient function provider: for (fully-inclusive) DIS we can do something in yadism, but for double-hadronic (and anything else) it is a completely different story

What exactly changes on the process sidee when you use polarized PDFs?

To partially answer my own question: the matrix elements change, of course, since we only evaluate very specific (initial-state) polarizations.

However, must the PDF usage change as well? I found the following comment in the set description of NNPDFpol10_100:

Warning: only q+qbar and gluon combinations should be used. Valence combinations q-qbar should not be used

This warning isn't present in NNPDFpol11_100 anymore. Is this relevant?

Zaharid commented 2 years ago

Note that in principle one can write anything in LHAPDF headers, and that can be used in various ways. For example

https://github.com/Zaharid/mcscales_tools

enocera commented 2 years ago

However, must the PDF usage change as well? I found the following comment in the set description of NNPDFpol10_100:
Warning: only q+qbar and gluon combinations should be used. Valence combinations q-qbar should not be used
This warning isn't present in NNPDFpol11_100 anymore. Is this relevant?

This is because NNPDFpol10 was determined by fitting only neutral-current, photon mediated, polarised DIS data. And because of the way the observable factorises, one cannot disentangle quarks from antiquarks: coefficient functions always weight the sum of quarks and antiquarks. Because we were advised against modifying the LHAPDF reader to only return the sum of quarks and antiquarks, we added the warning in the .info file. In NNPDFpol11 we were instead sensitive to quarks and antiquarks, because we also fitted spin asymmetries for W boson production in polarised proton-proton collisions.

Radonirinaunimi commented 2 years ago

while (I guess) in the first iteration of this project we want to do a polarized-only DIS-only fit, we should probably not limit ourselves to these restrictions since we're rethinking everything at the moment ...

Yes! The plan will be to first tackle the polarized-only DIS fit, ie reproducing NNPDFpol1.0.

This would n3fit require eventually to deal with several theories in parallel (for doing polarized and unpolarized in parallel) - is this feasible @scarlehoff?

This is not needed! As @enocera said, everything can be in one single theory and use the naming to differentiate between one experiment (as you mentioned for example in the case of EIC). As a matter of fact, this is what I will be doing for the combined proton+nuclear fits (once the theory is pushed to the CERN server).

In addition, it would be much preferable if a mention on what is being fitted for a given data set is included in the metadata.yaml, something along the line of:

fit_type: unpolarized_proton/polarized_proton/nuclear/fragmentation_function

felixhekhorn commented 2 years ago

@enocera

Anyway, all this is perhaps a little premature - Gargnano will possibly be a good time to discuss the details, also in light of the experience we will get with the new commondata layout and with the proton+nuclear fit in the meantime.

agreed - as already said in my first paragraph :innocent: also as said there, I just wanted to clear my mind :upside_down_face:

in the sense that I'd rather have the FK tables for the unpolarised, nuclear and polarised PDFs in the same folder.

good - that should make the interface with n3fit easier

Well, think about what happens with RHIC, which measures proton-proton, proton-ion and ion-ion cross sections (with possibly polarised proton beams): the observable is different. In case of polarised proton collisions, RHIC measures a single or double spin asymmetry. In the case of an unpolarised collision, RHIC measures a cross section. I understand that @cschwan 's naming convention takes into account the name of the observable. And, in my opinion, that should be sufficient.

accepted - still, I do need a boolean flag, which at this point would be dealt inside pinecardsrunner as we define the observable there. This flag is the one that is passed to eko via pineko

felixhekhorn commented 2 years ago

In addition, it would be much preferable if a mention on what is being fitted for a given data set is included in the metadata.yaml, something along the line of:
fit_type: unpolarized_proton/polarized_proton/nuclear/fragmentation_function

I agree with that - this should make e.g. filtering much easier - otherwise you have to know that these given set of keywords mean polarized (or whatever); e.g. SASY (meaning spin asymmetry) is polarized, but WASY is not

alecandido commented 2 years ago

No need for separate flags or anything custom: we already make PineAPPL aware of the observable, we should just parse it in pineko.

On the other hand, unless we're really strict and explicit on allowed names for observables, it is true that we'll need some automated choices based on being polarized or not, so we can consider an explicit tag in the pineappl (grid) metadata.

Instead, pinecardsrunner will recognize on its own whether it is polarized or not, with information provided from yadism.

felixhekhorn commented 2 years ago

@cschwan

What exactly changes on the process sidee when you use polarized PDFs?

To partially answer my own question: the matrix elements change, of course, since we only evaluate very specific (initial-state) polarizations.

However, must the PDF usage change as well?

exactly - speaking from my DIS experience I'd say:

the matrix element changes (as you said)
the phase space is the same (except for a normalization eventually)
the splitting function changes (as @enocera said) and hence the way you renormalize collinear splittings
the PDF changes - meaning you have to ask to LHAPDF for a polarized set (knowing it is polarized)

juanrojochacon commented 2 years ago

I think LHAPDF has special IDs for polarised partons, but it has been a long time. We can check but this should not be a problem.

felixhekhorn commented 2 years ago

@Radonirinaunimi should we continue our discussion here, move to a dedicated issue or move to private?

Radonirinaunimi commented 2 years ago

@Radonirinaunimi should we continue our discussion here, move to a dedicated issue or move to private?

Let's first discuss privately and then put the proposal here. The only annoying thing is that this afternoon I'll be fully booked for meetings so shall we discuss on Monday?

felixhekhorn commented 2 years ago

After the discussion with @Radonirinaunimi we have a clearer picture of what we actually want:

@Radonirinaunimi is looking for a further discrimination in the data of the report (eg. chi2) (as far as I understand)
I'm looking for a sanity check between fit and dataset

and we concluded that we can solve both issue with the same proposal: we would like to introduce a new field into the CommonData definition: one possible name might be additional_fitting_dimensions (as a very verbose option and that can be of course negotiated). The value is a list of predefined options, that can be empty, but has to be unique (so no repetition)

possible values:

additional_fitting_dimensions: (empty) - the fit delivers objects with in two dimensions: flavor + x (the default situation suitable for unpolarized proton PDFs)
additional_fitting_dimensions: nuclear - the fit delivers objects with in three dimensions: flavor + x + A, suitable for nuclear PDF fits. The metadata only states the requirement that this additional dimension has to be there, but not how to collapse on this dimension, which instead is determined by the individual FK table.
additional_fitting_dimensions: polarized the fit delivers objects with in three dimensions: flavor + x + p, where p can take the values p=0=unpolarized and p=1=polarized. Again the metadata only states the requirement but not the collapse. For a typical spin asymmetry A_1=g_1/F_1 there would be two FK table one would say please contract me with the 2D tensor obtained by setting p=1 (->g1) and the other please contract me with the 2D tensor obtained by setting p=0 (->F1).

this is backward compatible since empty just means no additional dimensions (just flavor+x)
this is extensible since the moment we need a new dimension we can just add it to the possible values (say FF)
you can combine several options: say fitting polarized+nuclear+FF all in one go (as @juanrojochacon is thinking)

now,

this does solve my problem, since before loading a dataset you can check whether you can accommodate for that in the fit
and even chi2 can be grouped by this: as, I imagine, e.g. one would like to freeze all dimension first and then determine the default condition before opening other channels and typically I would expect the fit to behave quite different in the dimensions (e.g. due to the amount of data)

felixhekhorn commented 2 years ago

After discussing with @scarlehoff and @RoyStegeman (some time ago), they convinced me that my (technical) concerns can be dealt with: a given FK table has to provide the necessary informations and the necessary luminosities (in PineAPPL language)/flavor masks (in vp language). The non-trivial step of mapping this information on the NN content is vp business (and hence not my concern :innocent: ). The actual action item is to provide the necessary information in pinecardrunners (e.g. A and Z for nuclear via the theory card or similar; see also https://github.com/N3PDF/yadism/issues/122).

@Radonirinaunimi if you still want an additional field for the report side feel free to reopen this issue or open a new one ...

Radonirinaunimi commented 2 years ago

After discussing with @scarlehoff and @RoyStegeman (some time ago), they convinced me that my (technical) concerns can be dealt with: a given FK table has to provide the necessary informations and the necessary luminosities (in PineAPPL language)/flavor masks (in vp language). The non-trivial step of mapping this information on the NN content is vp business (and hence not my concern 😇 ). The actual action item is to provide the necessary information in pinecardrunners (e.g. A and Z for nuclear via the theory card or similar; see also N3PDF/yadism#122).

@felixhekhorn Yes! You might recall that we discussed at some point about how some of your technical concerns can be addressed from the theory side; and this also includes the A and Z dependence from my part (which are starting to be addressed ATM, see for example https://github.com/N3PDF/pineappl/issues/135).

@Radonirinaunimi if you still want an additional field for the report side feel free to reopen this issue or open a new one ...

The concerns I have with regards to having different grouping for the fit and/or plotting, unfortunately, can only included in the metadata. I can't think of any place else where such information could be stored. However, so far, I wasn't able to come up with the wisest solution to address @Zaharid's and @enocera's concerns. Hence, for the time being, I am happy for this to be closed; let's see when the actual issue resurfaces.

NNPDF / nnpdf

Polarized in CommonData #1559