Matching to theory prediction

giacomomagni commented 2 years ago

This PR is for the implementation of the matching to theory predictions

giacomomagni commented 2 years ago

We should use the convention TYPE_DATASETNAME_OBSTYPE_EXTRA (where EXTRA can be an additional specificity of the dataset) for consistency and easier manipulation.

PS: I might have missed some places where this appear in the suggestions below.

Have you tried that this is working? I didn't append matching at the end, because there were some issue. But if you say this is the syntax I'm fine with it.

Radonirinaunimi commented 2 years ago

We should use the convention TYPE_DATASETNAME_OBSTYPE_EXTRA (where EXTRA can be an additional specificity of the dataset) for consistency and easier manipulation. PS: I might have missed some places where this appear in the suggestions below.

Have you tried that this is working? I didn't append matching at the end, because there were some issue. But if you say this is the syntax I'm fine with it.

I haven't tried this yet given that I expected that this should not change anything, but let me actually do the change locally.

giacomomagni commented 2 years ago

I haven't tried this yet given that I expected that this should not change anything, but let me actually do the change locally.

Don't worry I should have fixed, thanks in any case.

Radonirinaunimi commented 2 years ago

@RoyStegeman @giacomomagni

Some generic comments:

Given that the nNNPDF does not provide PDFs for $\rm{Ne}(A=20, Z=10)$, I used the nCTEQ15_20 for the BEBCWA59 datasets.
The CHARM datasets do not currently have the corresponding Yadism matching given that I was not sure which nPDF to use (we might want to rethink this later).

@RoyStegeman You can implement the part that computes the covmat here:

https://github.com/NNPDF/nnusf/blob/7c19abc3c715e97ce7e95853c387421872773ebc/src/nnusf/data/loader.py#L225-L227

The predictions (which contain the predictions for all replicas except replica_0) are stored in commondata/matching/ with names prepended by MATCH_<DATASET_NAME>.

@giacomomagni This PR needs thorough reviews in order to make sure that there are no bugs. In particular, when re-generating the coefficients, the NUTEV_DXDYNUU and NUTEV_DXDYNUB were also modified (I have not pushed them) which shouldn't.

RoyStegeman commented 2 years ago

@RoyStegeman You can implement the part that computes the covmat here:

Thanks, I'll take care of it once @giacomomagni is happy and this PR is closed. Currently trying to implement feature scaling to see if that makes a difference.

Radonirinaunimi commented 2 years ago

@giacomomagni There are still some problems with the BC data. I'll notify you when this is solved.

giacomomagni commented 2 years ago

@Radonirinaunimi okay I'll review it as carefully as possible. Did you use the branch develop to of yadism to generate the updated coefficients? For the charm measurement there is not much we can do (except asking a confirmation to Juan)

Radonirinaunimi commented 2 years ago

@giacomomagni I believe everything now is fixed in terms of implementation. You can have a go now.

@Radonirinaunimi okay I'll review it as carefully as possible. Did you use the branch develop to of yadism to generate the updated coefficients?

Actually no (TBH, I totally forgot about this)! This might indeed be the reason why the NUTEV_ datasets were modified. I'd be then good as a crosscheck if you could re-generate the files for all MATCHING and PROTONBC in case there are other discrepancies.

Just as a note, all the PROTONBC have to be generate at the same time otherwise the .info file will be screwed:

nnu data proton_bc <folder_containing_grids>/grids-PROTONBC_*_MATCHING.tar.gz NNPDF40_nnlo_as_01180

giacomomagni commented 2 years ago

okay thanks, so I'll review and update the coefficients files.

Radonirinaunimi commented 2 years ago

@giacomomagni Why the standard NUTEV datasets have been update? The one in master should be correct, no (at least from the data-theory comparisons)?

giacomomagni commented 2 years ago

@giacomomagni Why the standard NUTEV datasets have been update? The one in master should be correct, no (at least from the data-theory comparisons)?

I'm not 100% sure why they changed. An option might be that we never fixed them on master. The data_vs_theory comparison did not use the coefficients tables... All the other files were unchanged when I tried to regenerate them.

Radonirinaunimi commented 2 years ago

Looking at the comparison plots, nutev-nuu-master vs nutev-nnu-this-pr & nutev-nub-master vs nutev-nub-this-pr, it does not seem that something has changed. Also, by comparing the numpy arrays, they all seem to be the exact same as in master. This means that this might have changed somewhere in this PR. So, everything is then consistent.

Radonirinaunimi commented 2 years ago

Okay, should be good for me. Last commits were for:

* [x]  reimplementation of the scripts to generate empty matching tables

* [x]  fix NUTEV coefficients

Before merging we need:

* [ ]  cov mat for the matching tables

* [ ]   run a fit to see if numbers make sense

@RoyStegeman I'd agree that we should add in this PR the part that computes the CovMat and do some testings before merging this PR (in case there are fundamental bugs).

RoyStegeman commented 2 years ago

Sure

Radonirinaunimi commented 2 years ago

Merging this given that its main purpose (namely building the framework to generate and fit Yadism-like datasets) is achieved. Issues related to fit stability are/will be investigated in other PRs.

NNPDF / nnusf

Matching to theory prediction #29