Evolution with large EKOs

alecandido commented 1 year ago

As required in https://github.com/NNPDF/pineappl/issues/185#issuecomment-1439662372 I'm opening a new issue for the problem.

The problem is not a problem at the moment, since we only have a memory problem with the EKO for the whole NNPDF4.0 data set altogether, but we always split by process. And those who require the biggest EKOs are jets, which require an EKO of ~4GB.

However, at the moment PineAPPL needs to load the whole 4GB or whatever they are. But this is only a problem in PineAPPL, since EKO can break them and load one piece at a time (i.e. one scale at a time).

Actually, the format is already available. As soon as I will have solved the current "top-priority", I will step in and solve myself, providing a lazy-loader to reduce memory consumption (based on the current EKO loader and the new format).

cschwan commented 1 year ago

Do we have evidence that there really is a problem with Grid::evolve?

alecandido commented 1 year ago

Do we have evidence that there really is a problem with Grid::evolve?

The problem is in the signature: https://github.com/NNPDF/pineappl/blob/fc50e9c5d785076968f52ad9b913e32c0415bfa5/pineappl/src/evolution.rs#L168

If you ask for a rank-5, that's the full one. And we know the size of the full EKO for these processes. In particular, we need to pass instead a callable for ArrayView4, and only when you call for that $Q^2$ element the object is loaded in memory. It is available in Python, but not available in Rust (however, it is just a matter of looking the correct .np{y,z}.lz4 file inside a tar folder, so not much different from what you have already done.

cschwan commented 1 year ago

I understand that the whole thing will be big, but have you ever 1) run Grid::evolve with a large EKO, 2) saw it really is slow and 3) verified the culprit is that you pass the entire operator? From your description it seems that you anticipate it being a performance problem, but so far it seems we don't have evidence for it.

alecandido commented 1 year ago

It is not a performance problem, just a memory consumption problem.

Passing a 5-dim operator constructed from the 4-dim ones, or a loader to 4-dim operator, will have exactly the same performance (unless that there are more pieces involved, since you have to join them in Python and might be slower). However, no one is concerned about this performance differences, but only that the 5-dim object might be too big for some specific data sets on not too powerful machines, without a good reason to consume all that memory.

cschwan commented 1 year ago

OK, I see! In that case, a straightforward optimization would need functions to perform the following operations:

which (pid1, pid0) combinations have a non-zero operator? This is currently done here for the indices of the pids: https://github.com/NNPDF/pineappl/blob/fc50e9c5d785076968f52ad9b913e32c0415bfa5/pineappl/src/evolution.rs#L102-L118 However, this going over the entire operator and I think after generation EKO should 'know' this.
for each non-zero PID combination found above we next need a subset of the remaining three-dimensional operator, see here: https://github.com/NNPDF/pineappl/blob/fc50e9c5d785076968f52ad9b913e32c0415bfa5/pineappl/src/evolution.rs#L208-L220 This slices the full operator in its muf2- and x1-axes. These 3D operators I think we can safely pass around without any memory problems.

alecandido commented 1 year ago

We are more or less slicing the other way round, as described above ($\mu_F^2 \mapsto O_4$).

The reason is that different scales are really independent, while in flavor basis in general there are no non-zero pairs (pid0, pid1). Indeed, the operator is block-diagonal in evolution basis (QCD: singlet/gluon + diagonal, QCD+QED: singlet-u/singlet-d/gluon/photon + valence-u/valence-d + diagonal), and once you rotate even a single basis (input or output) to flavor basis, you have all entries filled.

The only special case is intrinsic: if you evolve charm below its matching scale (usually charm mass) then the evolution is the identity in flavor basis. But this is really the only case.

Moreover, in order to construct a slice containing the elements related to all $\mu_F^2$ values you have to load the full EKO (even though not all at once, so no memory problem, just inefficient because you load many times the same files from the disk - if you cache them, you have back the memory issue).

cschwan commented 1 year ago

We are more or less slicing the other way round, as described above (μF2↦O4).

This can be done as well, but I'm not sure it's as efficient, but we'd simply have to try it out.

The reason is that different scales are really independent, while in flavor basis in general there are no non-zero pairs (pid0, pid1). Indeed, the operator is block-diagonal in evolution basis (QCD: singlet/gluon + diagonal, QCD+QED: singlet-u/singlet-d/gluon/photon + valence-u/valence-d + diagonal), and once you rotate even a single basis (input or output) to flavor basis, you have all entries filled.

That's not entirely correct: for Nf=5 PDFs there are no top quarks at both process and fitting scale (PIDs -6 and 6), often there's no photon and at fitting scale there are no bottom quarks (PIDs -5 and 5). For some PDFs there aren't even charm quarks! For NNPDF4.0 we can decrease the dimensions from 14x14 to 9x11, and that's a reduction of the operator size of 50%!

alecandido commented 1 year ago

Ok, you're right.

But for most of the processes even bottom is there, it is just a fraction of DIS data that do not have the bottom, and almost always not for every value of $\mu^2$, so you can't drop the slice. The further limitations are true only when fitting a specific PDF set (e.g. Nf3/Nf4/...), but these are considered variations in NNPDF (and mostly also by the other collaborations), so you only have a couple of them in a full release, and you usually develop only on the VFNS.

Moreover, as soon as we will have QED (that is decided to be almost done) the default fits and evolution will include the photon. So you will have all flavors but bottom and top at the input scale, and all flavors but top at, at least, one output scale. So your typical operator will be 10x12 (for as long as someone won't propose to fit the bottom, but for that we have time, since we're still digesting $c^-$).

This can be done as well, but I'm not sure it's as efficient, but we'd simply have to try it out.

This is extremely convenient to evolve PDFs, and the only way to compute an EKO (since we can solve DGLAP only for one final scale at a time, but all x values and flavors at the same time).

However, that EKO loader will be available anyhow, with the $\mu^2 \mapsto O_4$ layout, because it's the one matching the new output format. If you wish, once it will be ready, you'll be able to test the most convenient way (and decide if using as it is, or reconstruct the $O_3$ slices, joining different final scales).

cschwan commented 12 months ago

@AleCandido I thought a bit more about the problem of what interface we should have for this, and one idea would be to have a Grid::evolve_slice. This is basically Grid::evolve, but only evolves one (factorization-scale) slice of the Grid, corresponding to a 4-dimensional EKO (slice), and ignores everything else. Then the user can combine Grid::evolve_slice and Grid::merge to sum every evolved slice. In this way we should be able to avoid the questions about callbacks. We can also rewrite Grid::evolve in terms of Grid::evolve_slice. What do you think?

alecandido commented 12 months ago

I believe it's a very good solution. The spirit was actually to consume them one by one, and I just never thought that the grid were sliced as well.

For completeness: sliced EKOs will be the only EKOs from now on, so you have to merge them anyhow to feed Grid::evolve. I would propose to even deprecate it, and eventually we could make it a wrapper for Grid::evolve_slice the moment the EKO Rust loader will be available.

The only unpleasant bit is that we should still improve Grid::merge to support small EKOs convolution fully in memory from Python (without intermediate grid write), because at the moment Python has only access to merge_from_file. https://github.com/NNPDF/pineappl/blob/c756b4ca839804212c8432d9f2a5491e9d6e6202/pineappl_py/src/grid.rs#L614-L618

cschwan commented 12 months ago

Grid is Clone now, so I don't think there's any showstopper of exposing Grid::merge to the Python interface.

alecandido commented 12 months ago

Grid is Clone now, so I don't think there's any showstopper of exposing Grid::merge to the Python interface.

Ah, great. I just didn't notice it happened. We can add immediately Grid.merge to the Python interface.

cschwan commented 12 months ago

PR #244 will implement the necessary changes.

NNPDF / pineappl

Evolution with large EKOs #209