Add interfaces compatible with EKO v0.11.x

alecandido commented 2 years ago

The best thing would be to be EKO independent, but this is not completely possible.

Furthermore, with the new structure will also be a not so clever idea, since we want to allow for very big EKOs, to be loaded one Q2 at a time.

So, for the future, we have two options:

either we find a way to pass a callable with a certain interface, and PineAPPL will query for operators, when needed
or we just make a Rust EKO library (needed in any case, NNPDF/eko#97, but will be extremely tiny), and have PineAPPL directly depend on it

For the time being, we just stick to the single operator, but I propose to move the construction out of PineAPPLpy

[x] receive directly a concatenated array in convolute_eko
[ ] same in evolve
[ ] recover tests

cschwan commented 2 years ago

It seems you're changing the arguments of the function and thus the API. If this is the case we can't merge it into master without breaking semver compatibility.

alecandido commented 2 years ago

If you wish, I can preserve the arguments, but that is not the only issue with compatibility.

In general, now we have a new memory structure, and maybe it is not 100% stable, because we just introduced and we need to do a few things. But the intention is to be as close as possible to the "final" one.

This was the first step towards the computation of jets EKOs, and in principle the structure is there to be used. Unfortunately, the way it works is not extremely simple to integrate with PineAPPL, since the way we did until now is to use rank 5 arrays: https://github.com/NNPDF/pineappl/blob/a913b6a5b8154b407283a36fba6081cf6ba58907/pineappl/src/grid.rs#L1794 but that's the object being too big, so we are splitting by Q2 in a list of rank 4 arrays, and loading them in memory one by one (and unload as soon as it has been used).

So, we have two goals:

in the short term, recover the old behavior (i.e. Array5), but starting from the new structure
in the longer term (but not too long), use the new behavior (i.e. Array4 one by one) to save memory

This PR is trying to do 1., asking immediately for the Array5, in such a way not too depend any longer so much on the EKO internal structure (so the concatenation is done by someone else). This would also improve current memory usage, since we can try to drop the operator errors, without affecting PineAPPL any longer (it is only a factor of 2, but it matters for large objects).

alecandido commented 2 years ago

Similar for the muf2_grid, since in this way PineAPPL doesn't need to rely on how EKO stores them.

felixhekhorn commented 2 years ago

It seems you're changing the arguments of the function and thus the API. If this is the case we can't merge it into master without breaking semver compatibility.

Consider that we could even fake in pineko for the moment and only do a single PR against PineAPPL the moment our output is more stable (because I expect still further changes)

alecandido commented 2 years ago

Consider that we could even fake in pineko for the moment and only do a single PR against PineAPPL the moment our output is more stable (because I expect still further changes)a

This I didn't consider, but it might be reasonable. On one side we are doing something not optimal twice: here, working with a legacy structure, and in Pineko, mocking to keep it.

But it's true that in the end we want to go for 2., so we can also go for mocking as a temporary solution, and face the change when we'll be ready for the proper one.

@cschwan how would you proceed for 2.? (in principle we could even start working for it: we don't have the full product in EKO, but this mostly concern computation, the structure is already there - apart confidence on its stability)

cschwan commented 2 years ago

At the moment option 1. (in the first comment) seems most attractive to me, mainly because that's what evolution::operators does in any case and because we could probably easily extend to also support slicing in the other dimensions.

cschwan commented 2 years ago

For option 2. you basically have to copy what I did in pineappl evolve.

alecandido commented 2 years ago

For option 2. you basically have to copy what I did in pineappl evolve.

Indeed, you have already done most of the job.

The practical question is actually more about if we want an explicit dependency on a single function in a given crate, or we just define an interface.

In any case, I would build on what you have already done, and I would also stick to just Grid.evolve for the complete version, and keep Grid.convolute_eko with the legacy behavior (i.e. Array5). In any case, it is already deprecated, and we will drop.

The proposal is to move that part of your code (the data loader in pineappl evolve) in the EKO repo, since it is related to the EKO format, so it makes sense that follows the natural evolution and development of the rest of EKO. We can't use directly that one, because we are changing the EKO format right now, and in particular we need lazy loading (and early dropping) for the memory. Apart from this, that's the code we need.

So, let's finalize the interface vs dependency decision, and then I'll start sketching the loader in EKO NNPDF/EKO#97 (maybe with the help of @andreab1997, while @felixhekhorn will take care of the completion of split computation NNPDF/EKO#138, that is the main bottleneck)

cschwan commented 2 years ago

In terms of memory it would probably be best to lazy load a three-dimensional array of the type used here:

https://github.com/NNPDF/pineappl/blob/4a183b5ed7bd0448baba42c810393cd73b938ade/pineappl/src/evolution.rs#L179-L185

This extracts (a slice of the) operators for each non-zero (pid1, pid0) tuple as follows:

in dimension 0, the factorization scale values active in the grid (fac1_indices),
in dimension 2, the x1 grid values active in the grid (x1_indices) and
all entries in dimension 4.

Storing three-dimensional operators would give you the benefit of throwing away operators that are zero (that's probably important if you have one big EKO where dimensions 0 and 2 are big).

As an interface we could have two callables:

pid0(pid1: Vec<i32>) -> Vec<i32> - given the vector of pid1, for which pid0 are there non-zero operators?
operator(fac1_slice, x1_slice) -> Array3<f64> - return the operator described above.

alecandido commented 2 years ago

That's an interesting idea, but it is something more with respect to the existing solution.

For us, we have to compute rank 4 operators, but they are computed one final scale at a time. That's why it made sense to split by what you call fac1_indices.

So: the layout how they are produced will be a set of Array4. Nothing prevents from optimizing the layout before convolution, also once for all (optimize and dump). However, it is not my main goal right now, so if you wish we/you can do the offline optimization or virtualize such an interface (but I don't expect to be much convenient, if in the end relies on Array4 below).

Instead, about the interface vs dependency, maybe there is not really a conflict: I will start providing an interface like operator(fac1_slice) -> Array4<f64>. If PineAPPL will depend on this directly (I believe more likely), or it will receive the callable from Pineko, we can decide afterwards.

The moment I have something, you can have a look and see if it is possible/sensible to implement your interface on top.

cschwan commented 2 years ago

@AleCandido sounds good!

alecandido commented 1 year ago

By now, this is definitely outdated. I'd propose to close and wait for https://github.com/NNPDF/eko/pull/260, that would be the implementation of what is being discussed above.

Until then, I believe that there is no point to keep discussing this issue.

NNPDF / pineappl

Add interfaces compatible with EKO v0.11.x #186