Closed alecandido closed 1 year ago
It seems you're changing the arguments of the function and thus the API. If this is the case we can't merge it into master without breaking semver compatibility.
If you wish, I can preserve the arguments, but that is not the only issue with compatibility.
In general, now we have a new memory structure, and maybe it is not 100% stable, because we just introduced and we need to do a few things. But the intention is to be as close as possible to the "final" one.
This was the first step towards the computation of jets EKOs, and in principle the structure is there to be used. Unfortunately, the way it works is not extremely simple to integrate with PineAPPL, since the way we did until now is to use rank 5 arrays: https://github.com/NNPDF/pineappl/blob/a913b6a5b8154b407283a36fba6081cf6ba58907/pineappl/src/grid.rs#L1794 but that's the object being too big, so we are splitting by Q2 in a list of rank 4 arrays, and loading them in memory one by one (and unload as soon as it has been used).
So, we have two goals:
Array5
), but starting from the new structureArray4
one by one) to save memoryThis PR is trying to do 1., asking immediately for the Array5
, in such a way not too depend any longer so much on the EKO internal structure (so the concatenation is done by someone else). This would also improve current memory usage, since we can try to drop the operator errors, without affecting PineAPPL any longer (it is only a factor of 2, but it matters for large objects).
Similar for the muf2_grid
, since in this way PineAPPL doesn't need to rely on how EKO stores them.
It seems you're changing the arguments of the function and thus the API. If this is the case we can't merge it into master without breaking semver compatibility.
Consider that we could even fake in pineko for the moment and only do a single PR against PineAPPL the moment our output is more stable (because I expect still further changes)
Consider that we could even fake in pineko for the moment and only do a single PR against PineAPPL the moment our output is more stable (because I expect still further changes)a
This I didn't consider, but it might be reasonable. On one side we are doing something not optimal twice: here, working with a legacy structure, and in Pineko, mocking to keep it.
But it's true that in the end we want to go for 2., so we can also go for mocking as a temporary solution, and face the change when we'll be ready for the proper one.
@cschwan how would you proceed for 2.? (in principle we could even start working for it: we don't have the full product in EKO, but this mostly concern computation, the structure is already there - apart confidence on its stability)
At the moment option 1. (in the first comment) seems most attractive to me, mainly because that's what evolution::operators
does in any case and because we could probably easily extend to also support slicing in the other dimensions.
For option 2. you basically have to copy what I did in pineappl evolve
.
For option 2. you basically have to copy what I did in
pineappl evolve
.
Indeed, you have already done most of the job.
The practical question is actually more about if we want an explicit dependency on a single function in a given crate, or we just define an interface.
In any case, I would build on what you have already done, and I would also stick to just Grid.evolve
for the complete version, and keep Grid.convolute_eko
with the legacy behavior (i.e. Array5
). In any case, it is already deprecated, and we will drop.
The proposal is to move that part of your code (the data loader in pineappl evolve
) in the EKO repo, since it is related to the EKO format, so it makes sense that follows the natural evolution and development of the rest of EKO.
We can't use directly that one, because we are changing the EKO format right now, and in particular we need lazy loading (and early dropping) for the memory. Apart from this, that's the code we need.
So, let's finalize the interface vs dependency decision, and then I'll start sketching the loader in EKO NNPDF/EKO#97 (maybe with the help of @andreab1997, while @felixhekhorn will take care of the completion of split computation NNPDF/EKO#138, that is the main bottleneck)
In terms of memory it would probably be best to lazy load a three-dimensional array of the type used here:
This extracts (a slice of the) operators for each non-zero (pid1, pid0)
tuple as follows:
0
, the factorization scale values active in the grid (fac1_indices
),2
, the x1 grid values active in the grid (x1_indices
) and4
.Storing three-dimensional operators would give you the benefit of throwing away operators that are zero (that's probably important if you have one big EKO where dimensions 0
and 2
are big).
As an interface we could have two callables:
pid0(pid1: Vec<i32>) -> Vec<i32>
- given the vector of pid1
, for which pid0
are there non-zero operators?operator(fac1_slice, x1_slice) -> Array3<f64>
- return the operator described above.That's an interesting idea, but it is something more with respect to the existing solution.
For us, we have to compute rank 4 operators, but they are computed one final scale at a time. That's why it made sense to split by what you call fac1_indices
.
So: the layout how they are produced will be a set of Array4
. Nothing prevents from optimizing the layout before convolution, also once for all (optimize and dump). However, it is not my main goal right now, so if you wish we/you can do the offline optimization or virtualize such an interface (but I don't expect to be much convenient, if in the end relies on Array4
below).
Instead, about the interface vs dependency, maybe there is not really a conflict: I will start providing an interface like operator(fac1_slice) -> Array4<f64>
.
If PineAPPL will depend on this directly (I believe more likely), or it will receive the callable from Pineko, we can decide afterwards.
The moment I have something, you can have a look and see if it is possible/sensible to implement your interface on top.
@AleCandido sounds good!
By now, this is definitely outdated. I'd propose to close and wait for https://github.com/NNPDF/eko/pull/260, that would be the implementation of what is being discussed above.
Until then, I believe that there is no point to keep discussing this issue.
The best thing would be to be EKO independent, but this is not completely possible.
Furthermore, with the new structure will also be a not so clever idea, since we want to allow for very big EKOs, to be loaded one Q2 at a time.
So, for the future, we have two options:
For the time being, we just stick to the single operator, but I propose to move the construction out of PineAPPLpy
convolute_eko
evolve