Closed alecandido closed 1 year ago
Actually, do we need to act in pineappl at all? can't we just rely on UNIX' base64
? i.e. something like pineappl info --get runcard.tar.gz grid.pineappl.lz4 | base64 --decode - > runcard.tar.gz
should work, no? @cschwan
Good catch, @felixhekhorn!
In terms of reproducibility, I was thinking of the following:
# untars the (compressed) dataset directory AND theory used to crate the grid
pineappl info --get runcards [grid] | base64 --decode | tar xzf -
if [[ $(pinefarm --version) != $(pineappl info --get pinefarm_version [grid]) ]]; then
echo "pinefarm version different"
fi
pinefarm run [directory-name] [theory-name]
pineappl diff [grid-1] [grid-2] NNPDF31_nnlo_as_0118_luxqed && echo "success!"
can't we just rely on UNIX'
base64
?
I simply didn't know that base64
was distributed with GNU coreutils :)
Me neither, but it makes sense considering Email uses it to send attachments.
In terms of reproducibility, I was thinking of the following:
I agree on the layout:
pinefarm
docsWe can also get rid of the various generator-specific metadata, for instance output.txt
and launch.txt
I believe no longer to be relevant.
Ok, I agree on that. While instead I would keep the unraveled content of metadata.txt
as it is, that should be common to all the grids.
We can also get rid of the various generator-specific metadata, for instance
output.txt
andlaunch.txt
I believe no longer to be relevant.
Mmm? I have no idea of Mg5, but I believe e.g. https://github.com/NNPDF/runcards/blob/master/pinecards/ATLAS_1JET_8TEV_R06/launch.txt contains relevant pieces of information. Where else is the #user_defined_cut set atlas_1jet_8tev_r06 = True
?
It would be in the tarball contained in the metadata (and the numerical values in the theory), and therefore it would be stored twice.
As @cschwan said, the reproducibility is guaranteed by the tarball.
Where else is the
#user_defined_cut set atlas_1jet_8tev_r06 = True
?
This kind of data are useful to reproduce, but not extremely meaningful at first inspection. If you ever need to retrieve specifically that information, you can extract the tarball. But I expect it to happen seldom (while meanings of the bins, the observable, or the paper reference are useful to keep at hand).
but then you have to do either
pinefarm run a b
will no longer workNot really: the proposal is not to drop output.txt
from the runcard, but only from grids metadata.
Not really: the proposal is not to drop
output.txt
from the runcard, but only from grids metadata.
okay - now I understand! yes, I agree
As we decided with @felixhekhorn @cschwan and @scarlehoff, we will stop storing the pinecard version in the grid (or maybe make it optional?), and we will write the full tar-gzipped pinecard (the folder) in the metadata, encoded in a string with
base64
, that is a rather common encoding.PineAPPL will provide support for extracting the tarball from the metadata (i.e. decode
base64
to bytes, a redirect should do the rest of the job, I guess).