NNPDF / pinefarm

Generate PineAPPL grids from PineCards
https://pinefarm.readthedocs.io
GNU General Public License v3.0
1 stars 0 forks source link

Store pinecard in the grid #15

Closed alecandido closed 1 year ago

alecandido commented 1 year ago

As we decided with @felixhekhorn @cschwan and @scarlehoff, we will stop storing the pinecard version in the grid (or maybe make it optional?), and we will write the full tar-gzipped pinecard (the folder) in the metadata, encoded in a string with base64, that is a rather common encoding.

PineAPPL will provide support for extracting the tarball from the metadata (i.e. decode base64 to bytes, a redirect should do the rest of the job, I guess).

felixhekhorn commented 1 year ago

Actually, do we need to act in pineappl at all? can't we just rely on UNIX' base64? i.e. something like pineappl info --get runcard.tar.gz grid.pineappl.lz4 | base64 --decode - > runcard.tar.gz should work, no? @cschwan

cschwan commented 1 year ago

Good catch, @felixhekhorn!

In terms of reproducibility, I was thinking of the following:

# untars the (compressed) dataset directory AND theory used to crate the grid
pineappl info --get runcards [grid] | base64 --decode | tar xzf -

if [[ $(pinefarm --version) != $(pineappl info --get pinefarm_version [grid]) ]]; then
    echo "pinefarm version different"
fi

pinefarm run [directory-name] [theory-name]

pineappl diff [grid-1] [grid-2] NNPDF31_nnlo_as_0118_luxqed && echo "success!"
alecandido commented 1 year ago

can't we just rely on UNIX' base64?

I simply didn't know that base64 was distributed with GNU coreutils :)

cschwan commented 1 year ago

Me neither, but it makes sense considering Email uses it to send attachments.

alecandido commented 1 year ago

In terms of reproducibility, I was thinking of the following:

I agree on the layout:

cschwan commented 1 year ago

We can also get rid of the various generator-specific metadata, for instance output.txt and launch.txt I believe no longer to be relevant.

alecandido commented 1 year ago

Ok, I agree on that. While instead I would keep the unraveled content of metadata.txt as it is, that should be common to all the grids.

felixhekhorn commented 1 year ago

We can also get rid of the various generator-specific metadata, for instance output.txt and launch.txt I believe no longer to be relevant.

Mmm? I have no idea of Mg5, but I believe e.g. https://github.com/NNPDF/runcards/blob/master/pinecards/ATLAS_1JET_8TEV_R06/launch.txt contains relevant pieces of information. Where else is the #user_defined_cut set atlas_1jet_8tev_r06 = True?

cschwan commented 1 year ago

It would be in the tarball contained in the metadata (and the numerical values in the theory), and therefore it would be stored twice.

alecandido commented 1 year ago

As @cschwan said, the reproducibility is guaranteed by the tarball.

Where else is the #user_defined_cut set atlas_1jet_8tev_r06 = True?

This kind of data are useful to reproduce, but not extremely meaningful at first inspection. If you ever need to retrieve specifically that information, you can extract the tarball. But I expect it to happen seldom (while meanings of the bins, the observable, or the paper reference are useful to keep at hand).

felixhekhorn commented 1 year ago

but then you have to do either

alecandido commented 1 year ago

Not really: the proposal is not to drop output.txt from the runcard, but only from grids metadata.

felixhekhorn commented 1 year ago

Not really: the proposal is not to drop output.txt from the runcard, but only from grids metadata.

okay - now I understand! yes, I agree