Improve theory handling

cschwan commented 3 years ago

Starting with the Python implementation of the runner we have to specify a theory whenever we want to generate a grid, for instance

./rr run TEST_RUN_SH theories/theory_200.yaml

I think we should reflect this in the filename of the generated grid, so in this instance we should generate

[ ] TEST_RUN_SH_T200.pineappl.lz4, which tells us the grid was generated with theory 200.
- alternatively use folders
[ ] define a unified theory card format, that will include all parameters of all the generators
- some of them can have a default value, if they are not (or not yet) in NNPDF theory db
- it should be possible to generate the unified theory card from an entry of the theory db
[ ] add a theory converter for each external
- given a unified theory card, it should extract the minimal theory required for the given generator, by filtering and rearranging

Furthermore, we need to discuss

[ ] which parameters should be in the theory database (we definitely need an overcomplete set of parameters),
[ ] what the parameter means,
[ ] where it is typically used (and sometimes it is important where it isn't: we need to talk about DIS vs. hadron collider observables),

Further steps breakout:

[ ] use the theory provided variables (replace the hardcoded ones)
[ ] change name to the final grid to include theory dependency
[ ] apply a coherent scheme for CLI arguments

alecandido commented 3 years ago

I'll start working on these right after NNPDF/runcards#108 will be more or less settled.

alecandido commented 3 years ago

A couple of comments about practical steps

Further steps breakout:

[ ] use the theory provided variables (replace the hardcoded ones)

This one it's fairly easy on its own, but most likely we'll need immediately a theory extension, because not all the parameters used are currently available in theory database (e.g. widths are not)

[ ] change name to the final grid to include theory dependency

This is really easy as it is, but there is the further complication that adding just the theory ID we'll really become dependent on useless evolution parameters. For example 4.0 most likely required a single set of pineapplgrids, because there was no scan on EW parameters. Nevertheless, 4.0 consists of 10-20 theories, because they change evolution or alphas parameters.

Ideally I would like to depend only on parameters used (so one by one, not even on a subset as a block).

[ ] apply a coherent scheme for CLI arguments

The reason for this is that currently the runcard can be specified as a name or as a path, while for theories you always have to specify the path. I like the more flexible option of runcards.

cschwan commented 2 years ago

One problem that I've just noticed is that for Madgraph5_aMC@NLO runs the parameter values aren't extracted from the theory but rather from variables.json. This leads to a potential mismatch between DIS and collider datasets and is dangerous for this exercise: https://github.com/NNPDF/runcards/pull/134.

alecandido commented 2 years ago

Yes, this is well known: it is exactly the content of this issue.

If you remember, before they were hard-coded in the run.sh, so variables.json was the intermediate step to move towards a consistent theory. Problem is the our theories do not contain all the fields of variables.json.

I was waiting for a consistent theory scheme upgrade, but right now a further intermediate step came to mind: we can overwrite variables.json parameters with those contained in the theory.

This is something we don't want to do for theory 400, since it would be another source of discrepancies wrt theory 200 (and 400 is a transition theory anyhow): until now APPLgrids parameters were not always consistent with the specified theory (think about grids we received from someone else). Nevertheless, now that consistency with the old toolchain has been proven, and we're computing our own grids, we can start overwriting known parameters. We'll get rid of variables.json incrementally, while expanding the theory scheme.

NNPDF / pinefarm

Improve theory handling #1