NNPDF / pineappl

PineAPPL is not an extension of APPLgrid
https://nnpdf.github.io/pineappl/
GNU General Public License v3.0
13 stars 3 forks source link

Document guarantees about the stability of the CLI output #291

Open scarlehoff opened 6 months ago

scarlehoff commented 6 months ago

The PineaAPPL cli is much faster than e.g., the python interface. As a result, sometimes it is quite convenient to use it and then parse the results.

I'm currently parsing the cli instead of using convolute_with_one (with or without scales). This seems to work in most (all?) cases. Can I trust this output to be constant in time? If now, can we have some kind of --stable-output that will be stable in time?

At the moment my assumptions for uncert and convolve are:

  1. The three first lines are part of the header.
  2. The results are always the last columns and there's nothing beyond.

For reference, this is the code I'm using:

# before 0.74 you need to add `--silence-lhapdf`
def _convolute_with_pineappl_cli(grid_path, pdf_name, member):
    """Use the pineappl cli to compute the convolution of the grid with the given pdf

    Equivalent to:
        pineappl convolve <grid_path> <pdf>
    """
    cmd = ["pineappl", "convolve", grid_path, f"{pdf_name}/{member}"]
    result_raw = sp.run(cmd, stdout=sp.PIPE)
    results = result_raw.stdout.decode("utf-8").strip()
    predictions = [line.split()[-1] for line in results.split("\n")[3:]]
    return np.array(predictions, dtype=float)

def _scales_with_pineappl_cli(grid_path, pdf_name, n_scales):
    """Use the pineappl cli to compute the scale uncertainties for the grid

    Equivalent to:
        pineappl uncert <grid_path> <pdf> --scale-abs=9
    """
    cmd = ["pineappl", "uncert", grid_path, pdf_name, f"--scale-abs={n_scales}"]
    result_raw = sp.run(cmd, stdout=sp.PIPE)
    results = result_raw.stdout.decode("utf-8").strip()
    predictions = [line.split()[-n_scales:] for line in results.split("\n")[3:]]
    return np.array(predictions, dtype=float)
cschwan commented 6 months ago

It's not guaranteed to be stable, but so far the changes have been minimal between versions.

I'm thinking of making the output stable at some point and furthermore prefixing each line that's not data with # so you can simply parse the output as an ascii table.

Of course, the real issue here is that the Python interface is slower.