Closed cschwan closed 4 months ago
Commit 53acd7f9b930561741e9740480fca0d820068c7c implements a first candidate for Grid::evolve_slice
. The idea is first to reuse as much code as possible, write tests, then migrate Grid::evolve
to use Grid::evolve
and finally improve performance.
@AleCandido @felixhekhorn please have a look at https://github.com/NNPDF/pineappl/pull/244/files#diff-b5009bd7e6df3b2ec6c5ae2bca2a3e23ae3402990d34a878da2f81f9ee21bd56R71-R255 which introduces two new types, EkoSlices
and EkoSlicesIter
.
You first
EkoSlices
object by calling EkoSlices::new()
with the corresponding path for the tarball and additional information (renormalization scales, alphas and scale-variation factors). This reads just general information and doesn't yet read operators or slices of them.EkoSlices::iter_mut()
on your object, which creates an Iterator
that goes through the operator slices one after another. For each slice the iterator produces a tuple of OperatorSliceInfo
and Array4<f64>
, the corresponding operator slice.Grid::evolve_slice
, which as stated above, is very similar to Grid::evolve
except it only evolves the slice of the Grid
that matches the factorization scale of the operator.Grid
has been evolved, we reproduce what Grid::evolve
does, but without having the entire 5-dimensional operator in memory.I think these two types best belong into https://github.com/NNPDF/eko/pull/260, but here's a good playground to test them. I'm not completely convinced of the names, if you have better suggestions let me know.
I've already written a modified version of pineappl evolve
and confirmed that it's working in the one case I tested. However, I still want to support the older EKO formats; that's currently the missing bit.
In commits c8e85f82e2876afdb0144c6fa0370c5111976c9a, 7935cea73e304a4248ed3638c2146956c69768ca and 5d1a4e9e2b25b2fbc69c503b2ebe34daf75b7fd7 I added backwards compatibility, in the sense that previous file formats are now also supported.
Since commit 695e0d5674fdf6ccfe9c41ac0e32e594249c9d2b Grid::evolve_slice
now also works for DIS grids, and since commit b94ceeddeb1b1088873cd574587db23e9a416195 pineappl evolve
uses Grid:evolve_slice
instead of Grid::evolve
. Have a look at the function evolve_grid
in pineappl_cli/src/evolve.rs
to understand how Pineko would need to be changed.
It seems to work quite well already, but the evolution of E906 fails - some bins in the evolved grids are zero. I will investigate this tomorrow.
Commit b7872a77aa6530a086a28f8e29cf1a02c0957b8f fixed the evolution test with E906, for which the grid was the 'culprit'; it had some duplicated bins with the same bin limits. See also #237.
Now that we have an Iterator
over EKO slices, we can finally write the method that we wanted from the start:
impl Grid {
pub fn evolve_with_slices<'a>(
&self,
eko_slices: impl IntoIterator<Item = (OperatorSliceInfo, ArrayView4<'a, f64>)>,
order_mask: &[bool],
) -> Result<FkTable, GridError> {
// implementation
}
}
Instead of a callback we have an Iterator
, of course. We can then replace Grid::evolve
with Grid::evolve_with_slices
, where we pass the EkoSlices
object and the order mask to it. To make it work in Python we probably have to replace impl IntoIterator<...>
with the actual type (EkoSlices
).
Before commit ef962fa3896f6b8bb0c73b40794ff2e4969308e9, evolving CMS_2JET_7TEV.pineappl.lz4
with its 2.1-GByte-sized EKO on my laptop takes
real 95m34.725s
user 95m32.242s
sys 0m1.605s
and after it
real 28m26.322s
user 28m23.897s
sys 0m1.542s
That's an improvement by a factor of more than three!
Profiling shows that roughly 72% of the CPU cycles is spent on dot
and another 15% on scaled_add
, both in this line:
that's pretty good as well; we want the time to be spent on doing the linear algebra, not on the rest of the code, which is only organizational.
For this grid memory consumption is less than 150 MByte (resident).
On my desktop PC the numbers are similar:
real 73m18.566s
user 73m17.138s
sys 0m1.340s
vs.
real 20m15.134s
user 20m13.712s
sys 0m1.380s
Commit aa8bb9131320b296725e4f8fb0b676eacf54d670 replaces the critical line with two general_mat_mul
calls, which effectively gets rid of the scaled_add
call. On my desktop machine this improves
real 20m15.134s user 20m13.712s sys 0m1.380s
to
real 18m48.334s
user 18m46.855s
sys 0m1.441s
and on my laptop
real 28m26.322s user 28m23.897s sys 0m1.542s
to
real 25m48.613s
user 25m43.977s
sys 0m1.784s
That's a 7%/9% performance improvement over the previous commit, and perf
shows that roughly 92% of CPU cycles are now spent in general_mat_mul
.
Given that most of the CPU cycles are now spent in general_mat_mul
, the only way to possibly improve the performance further is to use an optimized BLAS implementation.
Here's the patch for this (using OpenBLAS):
diff --git a/pineappl/Cargo.toml b/pineappl/Cargo.toml
index 280fe4c..390f051 100644
--- a/pineappl/Cargo.toml
+++ b/pineappl/Cargo.toml
@@ -16,6 +16,7 @@ version.workspace = true
arrayvec = "0.7.2"
bincode = "1.3.3"
bitflags = "1.3.2"
+blas-src = { features = ["openblas"], optional = true, version = "0.8" }
enum_dispatch = "0.3.7"
float-cmp = "0.9.0"
git-version = "0.3.5"
@@ -23,6 +24,7 @@ indicatif = "0.16.2"
itertools = "0.10.1"
lz4_flex = "0.9.2"
ndarray = { features = ["serde"], version = "0.15.4" }
+openblas-src = { version = "0.10", optional = true, features = ["cblas", "static"] }
rustc-hash = "1.1.0"
serde = { features = ["derive"], version = "1.0.130" }
thiserror = "1.0.30"
@@ -34,3 +36,6 @@ rand = { default-features = false, version = "0.8.4" }
rand_pcg = { default-features = false, version = "0.3.1" }
serde_yaml = "0.9.13"
ndarray-npy = "0.8.1"
+
+[features]
+blas = ["dep:blas-src", "ndarray/blas", "dep:openblas-src"]
diff --git a/pineappl/src/evolution.rs b/pineappl/src/evolution.rs
index f7ddc23..551b965 100644
--- a/pineappl/src/evolution.rs
+++ b/pineappl/src/evolution.rs
@@ -1,5 +1,8 @@
//! Supporting classes and functions for [`Grid::evolve`].
+#[cfg(feature = "blas")]
+extern crate blas_src;
+
use super::grid::{Grid, GridError, Order};
use super::import_only_subgrid::ImportOnlySubgridV2;
use super::lumi::LumiEntry;
diff --git a/pineappl_cli/Cargo.toml b/pineappl_cli/Cargo.toml
index c2eacf4..1599be4 100644
--- a/pineappl_cli/Cargo.toml
+++ b/pineappl_cli/Cargo.toml
@@ -50,6 +50,7 @@ rustc-args = [ "--cfg feature=\"docs-only\"" ]
[features]
applgrid = ["dep:pineappl_applgrid"]
+blas = ["pineappl/blas"]
evolve = ["dep:base64", "dep:either", "dep:tar", "dep:lz4_flex", "dep:ndarray-npy", "dep:serde", "dep:serde_yaml"]
fastnlo = ["dep:pineappl_fastnlo"]
fktable = ["dep:flate2", "dep:tar"]
ndarray
's standard dgemm
implementation matrixmultiply
:
real 25m48.613s
user 25m43.977s
sys 0m1.784s
real 27m27.393s
user 27m25.840s
sys 0m1.500s
CMS_2JET_7TEV.tar
has.real 21m51.330s
user 21m50.062s
sys 0m1.977s
r-src
is another backend, but it doesn't offer static linkage and therefore is probably not interesting to us.real 18m48.334s user 18m46.855s sys 0m1.441s
OpenBLAS on my desktop is 10% faster:
real 16m53.612s
user 16m52.386s
sys 0m1.740s
My desktop has an
Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz
and my laptop
Intel(R) Core(TM) i7-4710HQ CPU @ 2.50GHz
Another test: on my desktop using OpenBLAS, evolving ATLAS_1JET_8TEV_R06
, which has an EKO that is 45 GB large (888 slices). The evolution is a bit faster than 6 hours, and resident memory consumption was never larger than 500 MB. That's not pretty, but I think it's OK!
We may want to introduce a progress bar again, for which we can now easily use ProgressBar::wrap_iter
, which thanks to the Iterator interface doesn't require any modification to the evolution methods!
EDIT: this time really with OpenBLAS:
real 301m45.642s
user 300m24.360s
sys 0m54.963s
Commit 890c2fb85ae7b7198736af74c5945f5763a5666d implements the idea outlined in https://github.com/NNPDF/pineappl/pull/244#issuecomment-1761434678.
oookay, since this is getting more and more advanced let me point out some issues:
to be more specific:
pineappl evolve
again)Grid::evolve
is still working, right? we need it hereeventually we would like to use this progress also in pineko, of course, and for that I foresee the following strategy:
eko
crate (done e.g. in https://github.com/NNPDF/eko/pull/189 )eko.couplings
to Rust - a limitation of the current implementation is that you need to have a PDF associated, not for the comparison afterwards, but to provide $\alpha_s$ during the creationeko.interpolation
to Rust to allow on-the-fly reshaping (which will resolve the new break pointed out above)oookay, since this is getting more and more advanced let me point out some issues:
- the raw linear algebra seems to be in very good shape now thanks to the efforts of @cschwan (and this is the absolute basic)
I would agree in so far that I've implemented the really core functionality, and now we need more interfaces - I think this what you're saying.
- unfortunately this is not the only thing we need: but we're lacking behind on the interface part ... at the moment, I think, we're not able to profit from the progress made here in Python (i.e. pineko)
Yes, absolutely, this still needs to be done.
to be more specific:
- concerning versions (see also my comment above): I'm guessing the three version @cschwan is seeing are "EKO file v0.13", "EKO file v0.12" and "pineko interface" - do you agree?
Unfortunately I'm not sure what you call them, but I'm using three different versions, where
Metadata::V2
) of EKO is very likely "EKO file v0.13"Metadata::V1
) would probably be "EKO file v0.12"; please check this file: https://data.nnpdf.science/pineappl/test-data/CMS_TTB_8TEV_2D_TTM_TRAP_TOT.tar and tell me what version it is and
- (to remind you: with either
convolute
side-effect pineko#124 or Changemanipulate
to act on Operator eko#292 we will break (heavily) the interface topineappl evolve
again)
pineappl evolve
requires the correctly reinterpolated EKO, and if it isn't provided it'll complain about that. In that sense I don't see a problem. However, if I understand you correctly there may be some users that want the reinterpolated EKOs (which at some point may not be possible anymore) rather the ones for some standard x grid, which you now provide.
- the problem with the "pineko interface" is that we will not profit of the improvements made here (neither time nor memory), since pineko is still using the old interface (i.e. e.g. loading the full operator) - and since there is no easy workaround I strongly suggest to not address this issue here in this PR, but to postpone
Right, we'll need to rewrite Pineko to switch from Grid::evolve
to use Grid::evolve_with_slice_iter
.
- to be explicit: @cschwan
Grid::evolve
is still working, right? we need it here
Yes, the code is completely untouched, but I plan to migrate Grid::evolve
to use the new methods under the hood (that's really a matter of untested code, which it is right now, and therefore test coverage), but the user shouldn't notice this. Eventually, Grid::evolve
should be deprecated.
eventually we would like to use this progress also in pineko, of course, and for that I foresee the following strategy:
introduce
eko
crate (done e.g. in https://github.com/NNPDF/eko/pull/189 )as suggested above maybe move some of the stuff from here to there
shift
eko.couplings
to Rust - a limitation of the current implementation is that you need to have a PDF associated, not for the comparison afterwards, but to provide αs during the creationeventually shift
eko.interpolation
to Rust to allow on-the-fly reshaping (which will resolve the new break pointed out above)
I agree with that plan, and if we're lazy we could simply dump the code in https://github.com/NNPDF/pineappl/blob/890c2fb85ae7b7198736af74c5945f5763a5666d/pineappl_cli/src/evolve.rs in the eko
module into a new Rust crate and call it a day. Realistically you'll probably want to do more as you said.
The reshaping isn't necessarily needed in Rust, it can also be done in EKO or probably Pineko. Having it done while loading in Rust, and then loading a much smaller operator, would be the cherry on top :smile: of it though. I don't think pineappl evolve
is too important, that's Pineko's job after all. I see it more as a tool implement features like this one, and having everything in pineappl
makes testing/debugging much easier for me.
Unfortunately I'm not sure what you call them, but I'm using three different versions, where
1. the most recent one (`Metadata::V2`) of EKO is very likely "EKO file v0.13" 2. the previous one (`Metadata::V1`) would probably be "EKO file v0.12"; please check this file: [data.nnpdf.science/pineappl/test-data/CMS_TTB_8TEV_2D_TTM_TRAP_TOT.tar](https://data.nnpdf.science/pineappl/test-data/CMS_TTB_8TEV_2D_TTM_TRAP_TOT.tar) and tell me what version it is and 3. the oldest EKO file format that is supported is for instance [data.nnpdf.science/pineappl/test-data/LHCB_WP_7TEV.tar](https://data.nnpdf.science/pineappl/test-data/LHCB_WP_7TEV.tar), which has one 5D operator.
I agree with the general approach of keeping data versions, but this is done inconsistently in EKO itself (though we are in the process of doing it as well in EKO, but not ready yet https://github.com/NNPDF/eko/issues/175).
Most likely the versions are exactly the ones you say, but I'd rather:
Grid::evolve
, and even this one I'd only keep for a transient)Consider that the data format has been kind of stabilized in EKO 0.13. It's not stable as a block, in the sense that some objects (like the runcards) are still mildly changing (and maybe even the metadata will change in 0.14, but I don't remember exactly, @felixhekhorn?). But the operators' content is as stable as reasonable: the arrays are those, and there is no foreseeable need nor plan to change them.
I'd rather focus on 0.13 and forget about the rest. Moreover, I'd propose to call it EKO data format v1, and start using the __data_version__
.
In PineAPPL you could keep using the version you are using right now, and just drop the older ones (for simplicity, and to save some valuable time). As soon as https://github.com/NNPDF/eko/pull/260 will roll out, we can do proper version management on the EKO side, and PineAPPL will only deal with the crate version.
Of course it is just a proposal, but I tried to take into account the short and long term as well, in the most practical way (I hope).
- (to remind you: with either
convolute
side-effect pineko#124 or Changemanipulate
to act on Operator eko#292 we will break (heavily) the interface topineappl evolve
again)
pineappl evolve
requires the correctly reinterpolated EKO, and if it isn't provided it'll complain about that. In that sense I don't see a problem. However, if I understand you correctly there may be some users that want the reinterpolated EKOs (which at some point may not be possible anymore) rather the ones for some standard x grid, which you now provide.
- the problem with the "pineko interface" is that we will not profit of the improvements made here (neither time nor memory), since pineko is still using the old interface (i.e. e.g. loading the full operator) - and since there is no easy workaround I strongly suggest to not address this issue here in this PR, but to postpone
Right, we'll need to rewrite Pineko to switch from
Grid::evolve
to useGrid::evolve_with_slice_iter
.
If we do not have a strong immediate need, I'd avoid solving the "Pineko problem". Pineko is working, possibly in a suboptimal way and heavy requirements, but working. Since things are complicated enough, I'd focus for a moment on PineAPPL and EKO themselves. The moment this relation is polished enough, we'll inform our users (i.e. we can move to Pineko). Let's just do one step at a time.
(Concerning the reinterpolated EKO, it's the exact same: we will always be able to dump the reinterpolated object, and use that directly - this is just suboptimal for memory and disk, but it is working now, and it will keep working, until the new interface will be ready)
Note to myself: adding a test with scale varied EKOs should be top priority, everything else can be added afterwards.
In commits d2a4f3d72e1edf898dbb653db848fa7ab9d10d33, b3450a5c27e65eeb088daff20e78b143bd735e0a and 6a86a88b578d953f3e8b1d352bf412e375b21283 I've removed the variables xir
, xif
, ren1
and alphas
from OperatorSliceInfo
, which reflects the changes between the old a new EKO versions. This should make the Rust EKO reader simpler.
Commits 90d9efdda34092786a59f1cb5c4091575acc9a72 and 1e417a0f6aace1a69256a61034b1f5d6c36c51eb add a first Python interface for Grid::evolve_with_slice_iter
, but I haven't been able to test it. @alecandido @felixhekhorn, if you'd like to go ahead with this please let me know, otherwise I will do it in a few days.
I'm not going to test myself in the short term, but thanks for this.
However, I still believe it's more promising to avoid passing arrays through the Python interface (as much as possible, of course). So, I'd rather invest in driving the Rust loader (differently from grids, the EKO is on disk, and this is all happening to avoid loading it entirely - so, communicating through files is not a limitation).
I've just merged this branch. The Python interface will be added in a separate PR.
TODO list:
mod eko
to EKOGrid::evolve
: implemented in commit 0292ac89f8c7ced44ef4f4a30e9a599808aab254Grid::evolve
: commit 91246246b406d18923b66d00c67c0ac4f7332092