Generalize the APPLgrid exporter

janw20 commented 2 months ago

Hi, I implemented the cases not yet implemented in the APPLgrid exporter. This includes:

Adding support for exporting PineAPPL grids with multi-dimensional and non-consecutive bins by exporting them as APPLgrids with integer bin limits 0..N. The ordering is the same as the PineAPPL bins, e.g. given by pineappl read --bins. An info message about the bin limits is printed to the user when a PineAPPL grid with multi-dimensional and non-consecutive bins is exported.
Adding support for arbitrary Q2, x1, x2 grids
Adding the option epsilon = 1e-12 to approx_eq! in lines 240 & 257. Otherwise, this failed for the grid I tested this with, for the two values 0.04979197630496172 and 0.04979197630496281 (the difference is only in the last 3 digits). Or would you handle this differently?

cschwan commented 2 months ago

Concerning 1): this is a good idea, it's surely better to do something than to error out. There's probably no meaningful alternative to the standard bin limits (0, 1), (1, 2), ... in the general case.

2): The problem with general bin limits is that APPLgrid requires a fixed functional form, which means that the bin limits x_i must be the results of a function x_i = f(i), and f is hard-coded in APPLgrid. In LagrangeSubgridV{1,2} we use the same function and therefore it happens to work in most cases. However, imagine changing the spacing between some of the points between x_min and x_max; in that case the APPLgrid won't give the right results. In other words: you only ever set x_min and x_max, but never the points in between, and they can be random.

3): I'd try changing the ulps parameter, there shouldn't be such a large difference (try doubling it).

For points 2) and 3) it would be good to have grids to test that with.

janw20 commented 2 months ago

Right, we could throw an error for now if a subgrid is not a LagrangeSubgridVx? Also, I did not know how to extract the interpolation order from the grid values, so I am just leaving it at the default value. Is there a way how to reconstruct the interpolation order from just the grid points? Or should the interpolation order maybe be stored in the grid files?
At ulps ≈ 470 this works without the epsilon, below that it still fails for x = 0.000014007074397454824 and 0.00001400707439745562 in my case. Should I just set it to 512 or is there a more systematic way to determine it? The float_cmp author recommends to set both ulps and epsilon, where epsilon is a small multiple of f64::EPSILON. For me this starts working with about 5.0 * f64::EPSILON with ulps = 128.

cschwan commented 2 months ago

Right, we could throw an error for now if a subgrid is not a LagrangeSubgridVx? Also, I did not know how to extract the interpolation order from the grid values, so I am just leaving it at the default value. Is there a way how to reconstruct the interpolation order from just the grid points? Or should the interpolation order maybe be stored in the grid files?

We can't throw an error, because then optimized grids can't be exported. That would be a big loss.

There's no way to reconstruct the interpolation order from the grid points alone, you can use the 50 that we always use for different interpolation orders. However I don't think the interpolation order in the exported APPLgrid is important, usually you don't want to fill them after having them exported.

At ulps ≈ 470 this works without the epsilon, below that it still fails for x = 0.000014007074397454824 and 0.00001400707439745562 in my case. Should I just set it to 512 or is there a more systematic way to determine it? The float_cmp author recommends to set both ulps and epsilon, where epsilon is a small multiple of f64::EPSILON. For me this starts working with about 5.0 * f64::EPSILON with ulps = 128.

Set it to 512. I don't think we need an epsilon comparison; I believe this is only needed when we compare against a zero result (0.0 != x for every non-zero x no matter the ulps parameter).

janw20 commented 2 months ago

I could also add a --ulps argument to pineappl export, which lets the user set a different value than the value of 512, with 512 then being a hard-coded default value?

cschwan commented 2 months ago

I could also add a --ulps argument to pineappl export, which lets the user set a different value than the value of 512, with 512 then being a hard-coded default value?

I don't think this is needed.

janw20 commented 2 months ago

Okay, I set ulps = 512 now in the checks of the x grid and removed the epsilon argument. From my side this can be merged

cschwan commented 2 months ago

What's missing are tests that check the features that you implemented. We need a PineAPPL grid

with multidimensional bin limits that have bin widths not equal to one, and one
with custom x-grid values.

Those two cases are possibly broken, so we better make sure to test it.

janw20 commented 1 month ago

While working on this, I realized: Why are the SubgridParams reconstructed? Why aren't the SubgridParams from the Grid used, e.g. by exposing them in a pub fn subgrid_params()?

cschwan commented 1 month ago

Why are the SubgridParams reconstructed? Why aren't the SubgridParams from the Grid used, e.g. by exposing them in a pub fn subgrid_params()?

Not every Subgrid necessarily has SubgridParams in the way understood by LagrangeSubgridV{1,2}. ImportOnlySubgridV{1,2} may have arbitrary x-node values, for instance linearly-spaced ones like [0.1, 0.2, 0.3, 0.4, 0.5]. These points are impossible to generate with the LagrangeSubgridV{1,2}.

janw20 commented 1 month ago

linearly-spaced ones like [0.1, 0.2, 0.3, 0.4, 0.5]

But the exporter can't handle these anyway, since APPLgrid can't handle them, right? (At least that's how I understood your first comment in this pull request.) So it would only matter (for the exporter) how the LagrangeSubgridV{1,2} handles the SubgridParams.

The ExtraSubgridParams are also not much of an issue, I think, since APPLgrid doesn't support different x1 and x2 values.

Also, is the Grid::subgrid_params field updated when optimizing the grid? Because if not, using this field would even solve the issue of stripped-away grid points at the start and the end of the grid axes.

janw20 commented 1 month ago

One could also properly document a potential Grid::subgrid_params() function, i.e. say that these are the parameters the subgrids were initialized with, not necessarily the current parameters.

cschwan commented 1 month ago

linearly-spaced ones like [0.1, 0.2, 0.3, 0.4, 0.5]

But the exporter can't handle these anyway, since APPLgrid can't handle them, right? (At least that's how I understood your first comment in this pull request.) So it would only matter (for the exporter) how the LagrangeSubgridV{1,2} handles the SubgridParams.

Yes, APPLgrid only understands the grid-node values spaced as LagrangeSubgridV{1,2} uses them.

The ExtraSubgridParams are also not much of an issue, I think, since APPLgrid doesn't support different x1 and x2 values.

It does support DIS grids, however, where one of the is trivial (zero and usually one x-grid value in the unused x-dimension)

Also, is the Grid::subgrid_params field updated when optimizing the grid? Because if not, using this field would even solve the issue of stripped-away grid points at the start and the end of the grid axes.

No it's not updated, but you can't rely on this field, because you can merge any kind of subgrid into any Grid. Grid::subgrid_params is solely for the purpose of filling temporarily empty subgrids (and this is an optimization, see the implementation of Grid::fill).

janw20 commented 1 month ago

No it's not updated, but you can't rely on this field, because you can merge any kind of subgrid into any Grid.

I see, this would be a showstopper for using Grid::subgrid_params then. Thanks for the clarification.

NNPDF / pineappl

Generalize the APPLgrid exporter #285