Remove empty unused `x1` and `x2` grid values in subgrids

cschwan commented 2 years ago

We already remove unused grid points in the muf2/mur2 dimension, and we should also be able to do that with x1 and x2. This might require some modifications here and there, but should improve the size of each subgrid as they usually have a small range in x in which the subgrid is non-zero.

This will require changing Grid::optimize to also modify ImportOnlySubgridV1 subgrids.

cschwan commented 1 year ago

@AleCandido @felixhekhorn I am working on this change; it will not decrease the file sizes, but I hope it will improve the performance of the evolution a bit. The price that we have to pay for this is that instead of having always the same x-grid everywhere - the usual 50 points - the grids return a subset of these 50 points. So while the x-grid points are still the same each process uses a different range of them. This might interfere with the caching of the EKOs, what do you think?

felixhekhorn commented 1 year ago

caching? the best we do at the moment is just the slice here (and that is on pid so unaffected): https://github.com/N3PDF/pineappl/blob/d924e9e41798c1f755849fd054938c05bbdb99c2/pineappl/src/grid.rs#L1740 afterwards we have the index remap, which would need to be shifted down to subgrid level as far as I understand https://github.com/N3PDF/pineappl/blob/d924e9e41798c1f755849fd054938c05bbdb99c2/pineappl/src/grid.rs#L1754

cschwan commented 1 year ago

Sorry, I meant caching of the EKOs in pineko or probably somewhere higher. It might be a problem that every grid we'd like to evolve then has a different x-grid (well, technically a subset as I said).

felixhekhorn commented 1 year ago

ok, that is no problem - with the output rework my idea was to compute the EKOs just by Q2 and then do the interpolation on x at fitting or process scale just on demand

alecandido commented 1 year ago

Of course we can do, if needed (that's why we implemented in the first place). But I'd try to avoid re-interpolating as much as possible...

I.e.: everything fine, we'll do as @felixhekhorn said, if we have to. But since we agreed to have a single x-grid for the theory, it would be ideal to have all the PineAPPLgrid with the same one, if possible. This will automatically skip re-interpolation when not needed

In any case, a subset is better than other options, there will be less interpolation error.

cschwan commented 1 year ago

I.e.: everything fine, we'll do as @felixhekhorn said, if we have to. But since we agreed to have a single x-grid for the theory,

This won't change.

it would be ideal to have all the PineAPPLgrid with the same one, if possible. This will automatically skip re-interpolation when not needed

That's the case, you should be able to feed the same set of EKOs to Grid::convolute_eko, and it'll select the necessary subset. But Grid::eko_info will return a smaller set of x-grid points, and I wonder whether higher up the toolchain this will lead to regeneration of many subsets of EKOs of the original 50x50 again and again for the newly implemented datasets with the implementation of pineko (?) right now. But it's certainly possible to avoid that.

alecandido commented 1 year ago

That's the case, you should be able to feed the same set of EKOs to Grid::convolute_eko, and it'll select the necessary subset.

Point is that if you are going to use a smaller set of points, the interpolation basis is different, and thus also the interpolation coefficients (you have to account also for the missing part). So re-interpolation is needed (though for a subset is for sure advantageous).

But Grid::eko_info will return a smaller set of x-grid points, and I wonder whether higher up the toolchain this will lead to regeneration of many subsets of EKOs of the original 50x50 again and again for the newly implemented datasets with the implementation of pineko (?) right now. But it's certainly possible to avoid that.

As @felixhekhorn said: if it's not already the case, we'll make sure that a new EKO won't be recomputed (unless strictly needed, e.g. because of a new Q2 request), just re-interpolating the available one.

However, what is already happening is that one EKO is generated for each dataset, since in general they have different Q2 value, even when the xgrid is actually the same. The "unique" EKO for the theory is only the one needed to evolve the fit result. Our goal is to transition to a two-EKOs-per-theory scheme, in which we'll have:

a small EKO, the one already present to evolve the fit
a big EKO, that will be the accumulator for all the pieces needed to evolve all the datasets (only possible if accumulation is on disk, unfeasible for full in-memory approach)

NNPDF / pineappl

Remove empty unused `x1` and `x2` grid values in subgrids #151