NNPDF / nnusf

An open source machine learning framework that provides predictions for all-energy neutrino structure functions.
https://nnpdf.github.io/nnusf/
GNU General Public License v3.0
0 stars 0 forks source link

General plan & roadmap #1

Closed Radonirinaunimi closed 1 year ago

Radonirinaunimi commented 2 years ago

Below are a few things that we should give some thoughts beforehand. These concern some of the technical difficulties I foresee mainly in terms of implementations. There exist various solutions to approach all of them, we just need to find the most optimal ones. Hence, this thread will not only serve as a place to collect ideas, but most importantly to converge on final decisions concerning the various aspects of the implementation (design choices, etc).

Fitting parametrization, masks, observables

Ideally, we would like to parametrize the three structure functions ($F_2$, $F_3$, and $F_L$) as the outputs of the NN. In this way, the object that we could compute at the level of the $\chi^2$ is the double-differential cross section $d^2 \sigma / (dxdQ^2)$ (or equivalently $d^2 \sigma / (dxdy)$). However, for a given $(x,Q^2)$-pair, most of the datasets provide measurements for either $d^2 \sigma / (dxdy)$ or both $F_2$ and $F_3$ at the same time. This means that for some of the measurements, one needs to leave out $F_L$ in the computations. Here, we could do the same as in n3fit, meaning that for each measurement one specifies which structure functions are active. Better suggestion on what should be compared at the $\chi^2$-level and how to implement it is more than welcome.

Covariance matrix, correlations

Given that we will implement our own module(s) for the reading & parsing of the experimental datasets, we will also have to compute the covariance matrix by ourselves as we will no longer be able to rely on validphys.

Custom/Early stopping

One of the technical parts we need to think about in terms of the fitting code is an easy implementation of the stopping. n3fit has a module that could perform a custom callbacks on a PDF-like fitting model. However, this module is very complicated and contains a huge amount of things we do not want. Thus, we need something simple enough that is able to check all the various features we would like on top of tracking the history of the $\chi^2$.

Theoretical constraints: sum rules, etc.

An important part of the fitting is the constraints that one can impose on the structure functions both along the momentum fraction $x$ and the energy scale $Q^2$. While no literature (as far as I have looked) provides constraints on the $Q^2$ (apart from the obvious fact that $F_i (1,Q^2)=0 \text{ } \forall \text{ } Q^2$), below are a couple of sum rules that we could potentially use:

Gross Llewellyn-Smith:

$$ \int_{0}^{1} dx F3^{\nu N} \sim \int{0}^{1} \left( u_v(x) + d_v(x) \right) = 3 $$

Adler:

$$ \int{0}^{1} \frac{dx}{x} \left( F{2}^{\bar{\nu} P} - F{2}^{\nu P} \right) \sim 2 \int{0}^{1} dx \left( u_v(x) - d_v(x) \right) =2$$

From momentum sum rules:

$$ x \Sigma(x)= F{2}^{\nu N} \Longrightarrow \int{0}^{1} dx F_{2}^{\nu N} \sim \frac{1}{2} $$

alecandido commented 2 years ago

Also the Adler sum rule is approximated: the structure function is never the PDF at higher orders. (And there is not enough cancellation between the $\nu$ and $\bar{\nu}$)

Radonirinaunimi commented 2 years ago

Also the Adler sum rule is approximated: the structure function is never the PDF at higher orders. (And there is not enough cancellation between the ν and ν¯)

Yep (especially also since it does not account for heavy flavors)! That was a typo and is fixed now.

RoyStegeman commented 2 years ago

Fitting parametrization, masks, observables Here I think a mask (flavormap) as in n3fit might be the easiest solution. Doesn't really matter though, as whatever method we choose, there will not be a gradient corresponding to a structure function that does not contribute to the chi2.

Custom/Early stopping The stopping module is actually not that complicated, it consists of many lines because (as you say) n3fit includes much functionality that we are not interested in anyway. A good example of this is the fitting of multiple replicas.

For stopping I think we should indeed use a simple callback function, how fancy we want to make this is up to us, but in the simplest implementation a history class storing the chi2s of the fit will get us a long way.

alecandido commented 2 years ago

Fitting parametrization, masks, observables Here I think a mask (flavormap) as in n3fit might be the easiest solution. Doesn't really matter though, as whatever method we choose, there will not be a gradient corresponding to a structure function that does not contribute to the chi2.

I agree. You can plug a bunch of zeros, and that would be the exact same, but masking is much more efficient, and standard in numpy (so I'm confident even in tf).

Covariance Matrix, correlations

Here I believe we'll be able to do something much simpler: according to Juan's review most data are old, and they most likely don't provide correlated systematics, so it might that we'll have variances, but no covariance at all. In any case, even if we were using vp, we shuld have had implement covariances our own, since any dataset might implement a different formula, and they are contained in the filters (plus across datasets correlations, but these are not that frequent).

We can support any case, and we just have to provide a matrix at the end, or even some blocks only, so not having to deal with vp internals we'll just speed up: once we implement suitable formulas (that we should have done anyhow), we'll have our covmat.

Theoretical constraints: sum rules, etc.

Here I'm not sure we want to implement them, since they are only approximated they might be inconsistent with our precision. In principle we could benchmark the precision in the perturbative regime, but it will be $\mathcal{O}(f^{LO} - f^{NLO})$, i.e. huge. We can implement as a "hint", but this hint it will be already contained (in the perturbative regime) in the PDFs used to generate predictions with yadism. We know they respect these constraints, because we imposed ourselves ;) So, maybe, I'd give up completely on sum rules.

I wonder if Juan we'll be able to provide some information on a vanishing $Q^2$ limit.

RoyStegeman commented 2 years ago

Agreed with @AleCandido.

On the covmat thing: we may want to keep in mind that for the theory prediction we'll use as boundary condition the covmat will be available.

On the th. constraints: I'm not in favor of enforcing approximations, but it might be worth checking how close we are a posteriori. Pinning the structure functions to 0 at x=1 of course we can do.

alecandido commented 2 years ago

On the covmat thing: we may want to keep in mind that for the theory prediction we'll use as boundary condition the covmat will be available.

Even more: since boundary conditions are data, but possibly even pseudodata yadism generated, we can exploit MC replicas to get a covmat even for them.

On the th. constraints: I'm not in favor of enforcing approximations, but it might be worth checking how close we are a posteriori.

Good idea, I agree.

Pinning the structure functions to 0 at x=1 of course we can do.

Definitely, but this we can even decide to hard-code (I'm not sure about now, but at least at some point in NNPDF the $NN(1)$ was subtracted from $NN(x)$).

RoyStegeman commented 2 years ago

Even more: since boundary conditions are data, but possibly even pseudodata yadism generated, we can exploit MC replicas to get a covmat even for them.

I thought that was how we were going to enforce the boundary conditions, with pseudodata as there we can provide predictions on any (x,Q) grid of our choice (I was indeed thinking exploiting MC replicas to get a covmat). What data for the BC are you referring to?

Definitely, but this we can even decide to hard-code (I'm not sure about now, but at least at some point in NNPDF the $NN(1)$ was subtracted from $NN(x)$).

That was indeed how I would enforce that constraint in practice ;)

alecandido commented 2 years ago

What data for the BC are you referring to?

In principle, parts of the datasets we're going to use is fully perturbative (e.g. part of CHORUS). But it's the same: in principle there might some advantage in including directly information (since part of it might not be consumed in a PDF fit), but since it has already been partially consumed we should compute covariances between data and pseudodata. It's a mess: let's just use pseudodata, for which the covmat is the one we said above (computed through MC replicas) and will be much simpler.

alecandido commented 2 years ago

Positivity

Perturbative regime

Covariance matrix

Radonirinaunimi commented 1 year ago

Closing as the main points here have all been addressed.