NNPDF / pineappl

PineAPPL is not an extension of APPLgrid
https://nnpdf.github.io/pineappl/
GNU General Public License v3.0
12 stars 3 forks source link

Add method(s) to support larger EKOs #244

Closed cschwan closed 4 months ago

cschwan commented 9 months ago

TODO list:

cschwan commented 9 months ago

Commit 53acd7f9b930561741e9740480fca0d820068c7c implements a first candidate for Grid::evolve_slice. The idea is first to reuse as much code as possible, write tests, then migrate Grid::evolve to use Grid::evolve and finally improve performance.

cschwan commented 9 months ago

@AleCandido @felixhekhorn please have a look at https://github.com/NNPDF/pineappl/pull/244/files#diff-b5009bd7e6df3b2ec6c5ae2bca2a3e23ae3402990d34a878da2f81f9ee21bd56R71-R255 which introduces two new types, EkoSlices and EkoSlicesIter.

You first

  1. create an EkoSlices object by calling EkoSlices::new() with the corresponding path for the tarball and additional information (renormalization scales, alphas and scale-variation factors). This reads just general information and doesn't yet read operators or slices of them.
  2. Then you call EkoSlices::iter_mut() on your object, which creates an Iterator that goes through the operator slices one after another. For each slice the iterator produces a tuple of OperatorSliceInfo and Array4<f64>, the corresponding operator slice.
  3. This is meant to be used with the newly created Grid::evolve_slice, which as stated above, is very similar to Grid::evolve except it only evolves the slice of the Grid that matches the factorization scale of the operator.
  4. By iterating through all slices, and checking that every slice of the Grid has been evolved, we reproduce what Grid::evolve does, but without having the entire 5-dimensional operator in memory.

I think these two types best belong into https://github.com/NNPDF/eko/pull/260, but here's a good playground to test them. I'm not completely convinced of the names, if you have better suggestions let me know.

I've already written a modified version of pineappl evolve and confirmed that it's working in the one case I tested. However, I still want to support the older EKO formats; that's currently the missing bit.

cschwan commented 9 months ago

In commits c8e85f82e2876afdb0144c6fa0370c5111976c9a, 7935cea73e304a4248ed3638c2146956c69768ca and 5d1a4e9e2b25b2fbc69c503b2ebe34daf75b7fd7 I added backwards compatibility, in the sense that previous file formats are now also supported.

Since commit 695e0d5674fdf6ccfe9c41ac0e32e594249c9d2b Grid::evolve_slice now also works for DIS grids, and since commit b94ceeddeb1b1088873cd574587db23e9a416195 pineappl evolve uses Grid:evolve_slice instead of Grid::evolve. Have a look at the function evolve_grid in pineappl_cli/src/evolve.rs to understand how Pineko would need to be changed.

It seems to work quite well already, but the evolution of E906 fails - some bins in the evolved grids are zero. I will investigate this tomorrow.

cschwan commented 9 months ago

Commit b7872a77aa6530a086a28f8e29cf1a02c0957b8f fixed the evolution test with E906, for which the grid was the 'culprit'; it had some duplicated bins with the same bin limits. See also #237.

cschwan commented 9 months ago

Now that we have an Iterator over EKO slices, we can finally write the method that we wanted from the start:

impl Grid {
    pub fn evolve_with_slices<'a>(
        &self,
        eko_slices: impl IntoIterator<Item = (OperatorSliceInfo, ArrayView4<'a, f64>)>,
        order_mask: &[bool],
    ) -> Result<FkTable, GridError> {
        // implementation
    }
}

Instead of a callback we have an Iterator, of course. We can then replace Grid::evolve with Grid::evolve_with_slices, where we pass the EkoSlices object and the order mask to it. To make it work in Python we probably have to replace impl IntoIterator<...> with the actual type (EkoSlices).

cschwan commented 9 months ago

Before commit ef962fa3896f6b8bb0c73b40794ff2e4969308e9, evolving CMS_2JET_7TEV.pineappl.lz4 with its 2.1-GByte-sized EKO on my laptop takes

real    95m34.725s
user    95m32.242s
sys     0m1.605s

and after it

real    28m26.322s
user    28m23.897s
sys     0m1.542s

That's an improvement by a factor of more than three!

Profiling shows that roughly 72% of the CPU cycles is spent on dot and another 15% on scaled_add, both in this line:

https://github.com/NNPDF/pineappl/blob/ef962fa3896f6b8bb0c73b40794ff2e4969308e9/pineappl/src/evolution.rs#L1086

that's pretty good as well; we want the time to be spent on doing the linear algebra, not on the rest of the code, which is only organizational.

For this grid memory consumption is less than 150 MByte (resident).

cschwan commented 9 months ago

On my desktop PC the numbers are similar:

real    73m18.566s
user    73m17.138s
sys     0m1.340s

vs.

real    20m15.134s
user    20m13.712s
sys     0m1.380s
cschwan commented 8 months ago

Commit aa8bb9131320b296725e4f8fb0b676eacf54d670 replaces the critical line with two general_mat_mul calls, which effectively gets rid of the scaled_add call. On my desktop machine this improves

real    20m15.134s
user    20m13.712s
sys     0m1.380s

to

real    18m48.334s
user    18m46.855s
sys     0m1.441s

and on my laptop

real    28m26.322s
user    28m23.897s
sys     0m1.542s

to

real    25m48.613s
user    25m43.977s
sys     0m1.784s

That's a 7%/9% performance improvement over the previous commit, and perf shows that roughly 92% of CPU cycles are now spent in general_mat_mul.

cschwan commented 8 months ago

Given that most of the CPU cycles are now spent in general_mat_mul, the only way to possibly improve the performance further is to use an optimized BLAS implementation.

Here's the patch for this (using OpenBLAS):

diff --git a/pineappl/Cargo.toml b/pineappl/Cargo.toml
index 280fe4c..390f051 100644
--- a/pineappl/Cargo.toml
+++ b/pineappl/Cargo.toml
@@ -16,6 +16,7 @@ version.workspace = true
 arrayvec = "0.7.2"
 bincode = "1.3.3"
 bitflags = "1.3.2"
+blas-src = { features = ["openblas"], optional = true, version = "0.8" }
 enum_dispatch = "0.3.7"
 float-cmp = "0.9.0"
 git-version = "0.3.5"
@@ -23,6 +24,7 @@ indicatif = "0.16.2"
 itertools = "0.10.1"
 lz4_flex = "0.9.2"
 ndarray = { features = ["serde"], version = "0.15.4" }
+openblas-src = { version = "0.10", optional = true, features = ["cblas", "static"] }
 rustc-hash = "1.1.0"
 serde = { features = ["derive"], version = "1.0.130" }
 thiserror = "1.0.30"
@@ -34,3 +36,6 @@ rand = { default-features = false, version = "0.8.4" }
 rand_pcg = { default-features = false, version = "0.3.1" }
 serde_yaml = "0.9.13"
 ndarray-npy = "0.8.1"
+
+[features]
+blas = ["dep:blas-src", "ndarray/blas", "dep:openblas-src"]
diff --git a/pineappl/src/evolution.rs b/pineappl/src/evolution.rs
index f7ddc23..551b965 100644
--- a/pineappl/src/evolution.rs
+++ b/pineappl/src/evolution.rs
@@ -1,5 +1,8 @@
 //! Supporting classes and functions for [`Grid::evolve`].

+#[cfg(feature = "blas")]
+extern crate blas_src;
+
 use super::grid::{Grid, GridError, Order};
 use super::import_only_subgrid::ImportOnlySubgridV2;
 use super::lumi::LumiEntry;
diff --git a/pineappl_cli/Cargo.toml b/pineappl_cli/Cargo.toml
index c2eacf4..1599be4 100644
--- a/pineappl_cli/Cargo.toml
+++ b/pineappl_cli/Cargo.toml
@@ -50,6 +50,7 @@ rustc-args = [ "--cfg feature=\"docs-only\"" ]

 [features]
 applgrid = ["dep:pineappl_applgrid"]
+blas = ["pineappl/blas"]
 evolve = ["dep:base64", "dep:either", "dep:tar", "dep:lz4_flex", "dep:ndarray-npy", "dep:serde", "dep:serde_yaml"]
 fastnlo = ["dep:pineappl_fastnlo"]
 fktable = ["dep:flate2",  "dep:tar"]
cschwan commented 8 months ago
cschwan commented 8 months ago
real    18m48.334s
user    18m46.855s
sys     0m1.441s

OpenBLAS on my desktop is 10% faster:

real    16m53.612s
user    16m52.386s
sys     0m1.740s
cschwan commented 8 months ago

My desktop has an

Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz

and my laptop

Intel(R) Core(TM) i7-4710HQ CPU @ 2.50GHz
cschwan commented 8 months ago

Another test: on my desktop using OpenBLAS, evolving ATLAS_1JET_8TEV_R06, which has an EKO that is 45 GB large (888 slices). The evolution is a bit faster than 6 hours, and resident memory consumption was never larger than 500 MB. That's not pretty, but I think it's OK!

We may want to introduce a progress bar again, for which we can now easily use ProgressBar::wrap_iter, which thanks to the Iterator interface doesn't require any modification to the evolution methods!

EDIT: this time really with OpenBLAS:

real    301m45.642s
user    300m24.360s
sys     0m54.963s
cschwan commented 8 months ago

Commit 890c2fb85ae7b7198736af74c5945f5763a5666d implements the idea outlined in https://github.com/NNPDF/pineappl/pull/244#issuecomment-1761434678.

felixhekhorn commented 8 months ago

oookay, since this is getting more and more advanced let me point out some issues:

to be more specific:

eventually we would like to use this progress also in pineko, of course, and for that I foresee the following strategy:

cschwan commented 8 months ago

oookay, since this is getting more and more advanced let me point out some issues:

  • the raw linear algebra seems to be in very good shape now thanks to the efforts of @cschwan (and this is the absolute basic)

I would agree in so far that I've implemented the really core functionality, and now we need more interfaces - I think this what you're saying.

  • unfortunately this is not the only thing we need: but we're lacking behind on the interface part ... at the moment, I think, we're not able to profit from the progress made here in Python (i.e. pineko)

Yes, absolutely, this still needs to be done.

to be more specific:

  • concerning versions (see also my comment above): I'm guessing the three version @cschwan is seeing are "EKO file v0.13", "EKO file v0.12" and "pineko interface" - do you agree?

Unfortunately I'm not sure what you call them, but I'm using three different versions, where

  1. the most recent one (Metadata::V2) of EKO is very likely "EKO file v0.13"
  2. the previous one (Metadata::V1) would probably be "EKO file v0.12"; please check this file: https://data.nnpdf.science/pineappl/test-data/CMS_TTB_8TEV_2D_TTM_TRAP_TOT.tar and tell me what version it is and
  3. the oldest EKO file format that is supported is for instance https://data.nnpdf.science/pineappl/test-data/LHCB_WP_7TEV.tar, which has one 5D operator.

pineappl evolve requires the correctly reinterpolated EKO, and if it isn't provided it'll complain about that. In that sense I don't see a problem. However, if I understand you correctly there may be some users that want the reinterpolated EKOs (which at some point may not be possible anymore) rather the ones for some standard x grid, which you now provide.

  • the problem with the "pineko interface" is that we will not profit of the improvements made here (neither time nor memory), since pineko is still using the old interface (i.e. e.g. loading the full operator) - and since there is no easy workaround I strongly suggest to not address this issue here in this PR, but to postpone

Right, we'll need to rewrite Pineko to switch from Grid::evolve to use Grid::evolve_with_slice_iter.

  • to be explicit: @cschwan Grid::evolve is still working, right? we need it here

Yes, the code is completely untouched, but I plan to migrate Grid::evolve to use the new methods under the hood (that's really a matter of untested code, which it is right now, and therefore test coverage), but the user shouldn't notice this. Eventually, Grid::evolve should be deprecated.

eventually we would like to use this progress also in pineko, of course, and for that I foresee the following strategy:

  • introduce eko crate (done e.g. in https://github.com/NNPDF/eko/pull/189 )

  • as suggested above maybe move some of the stuff from here to there

  • shift eko.couplings to Rust - a limitation of the current implementation is that you need to have a PDF associated, not for the comparison afterwards, but to provide αs during the creation

  • eventually shift eko.interpolation to Rust to allow on-the-fly reshaping (which will resolve the new break pointed out above)

I agree with that plan, and if we're lazy we could simply dump the code in https://github.com/NNPDF/pineappl/blob/890c2fb85ae7b7198736af74c5945f5763a5666d/pineappl_cli/src/evolve.rs in the eko module into a new Rust crate and call it a day. Realistically you'll probably want to do more as you said.

The reshaping isn't necessarily needed in Rust, it can also be done in EKO or probably Pineko. Having it done while loading in Rust, and then loading a much smaller operator, would be the cherry on top :smile: of it though. I don't think pineappl evolve is too important, that's Pineko's job after all. I see it more as a tool implement features like this one, and having everything in pineappl makes testing/debugging much easier for me.

alecandido commented 8 months ago

Unfortunately I'm not sure what you call them, but I'm using three different versions, where

1. the most recent one (`Metadata::V2`) of EKO is very likely "EKO file v0.13"

2. the previous one (`Metadata::V1`) would probably be "EKO file v0.12"; please check this file: [data.nnpdf.science/pineappl/test-data/CMS_TTB_8TEV_2D_TTM_TRAP_TOT.tar](https://data.nnpdf.science/pineappl/test-data/CMS_TTB_8TEV_2D_TTM_TRAP_TOT.tar) and tell me what version it is and

3. the oldest EKO file format that is supported is for instance [data.nnpdf.science/pineappl/test-data/LHCB_WP_7TEV.tar](https://data.nnpdf.science/pineappl/test-data/LHCB_WP_7TEV.tar), which has one 5D operator.

I agree with the general approach of keeping data versions, but this is done inconsistently in EKO itself (though we are in the process of doing it as well in EKO, but not ready yet https://github.com/NNPDF/eko/issues/175).

Most likely the versions are exactly the ones you say, but I'd rather:

Consider that the data format has been kind of stabilized in EKO 0.13. It's not stable as a block, in the sense that some objects (like the runcards) are still mildly changing (and maybe even the metadata will change in 0.14, but I don't remember exactly, @felixhekhorn?). But the operators' content is as stable as reasonable: the arrays are those, and there is no foreseeable need nor plan to change them.

I'd rather focus on 0.13 and forget about the rest. Moreover, I'd propose to call it EKO data format v1, and start using the __data_version__. In PineAPPL you could keep using the version you are using right now, and just drop the older ones (for simplicity, and to save some valuable time). As soon as https://github.com/NNPDF/eko/pull/260 will roll out, we can do proper version management on the EKO side, and PineAPPL will only deal with the crate version.

Of course it is just a proposal, but I tried to take into account the short and long term as well, in the most practical way (I hope).

alecandido commented 8 months ago

pineappl evolve requires the correctly reinterpolated EKO, and if it isn't provided it'll complain about that. In that sense I don't see a problem. However, if I understand you correctly there may be some users that want the reinterpolated EKOs (which at some point may not be possible anymore) rather the ones for some standard x grid, which you now provide.

  • the problem with the "pineko interface" is that we will not profit of the improvements made here (neither time nor memory), since pineko is still using the old interface (i.e. e.g. loading the full operator) - and since there is no easy workaround I strongly suggest to not address this issue here in this PR, but to postpone

Right, we'll need to rewrite Pineko to switch from Grid::evolve to use Grid::evolve_with_slice_iter.

If we do not have a strong immediate need, I'd avoid solving the "Pineko problem". Pineko is working, possibly in a suboptimal way and heavy requirements, but working. Since things are complicated enough, I'd focus for a moment on PineAPPL and EKO themselves. The moment this relation is polished enough, we'll inform our users (i.e. we can move to Pineko). Let's just do one step at a time.

(Concerning the reinterpolated EKO, it's the exact same: we will always be able to dump the reinterpolated object, and use that directly - this is just suboptimal for memory and disk, but it is working now, and it will keep working, until the new interface will be ready)

cschwan commented 4 months ago

Note to myself: adding a test with scale varied EKOs should be top priority, everything else can be added afterwards.

cschwan commented 4 months ago

In commits d2a4f3d72e1edf898dbb653db848fa7ab9d10d33, b3450a5c27e65eeb088daff20e78b143bd735e0a and 6a86a88b578d953f3e8b1d352bf412e375b21283 I've removed the variables xir, xif, ren1 and alphas from OperatorSliceInfo, which reflects the changes between the old a new EKO versions. This should make the Rust EKO reader simpler.

cschwan commented 4 months ago

Commits 90d9efdda34092786a59f1cb5c4091575acc9a72 and 1e417a0f6aace1a69256a61034b1f5d6c36c51eb add a first Python interface for Grid::evolve_with_slice_iter, but I haven't been able to test it. @alecandido @felixhekhorn, if you'd like to go ahead with this please let me know, otherwise I will do it in a few days.

alecandido commented 4 months ago

I'm not going to test myself in the short term, but thanks for this.

However, I still believe it's more promising to avoid passing arrays through the Python interface (as much as possible, of course). So, I'd rather invest in driving the Rust loader (differently from grids, the EKO is on disk, and this is all happening to avoid loading it entirely - so, communicating through files is not a limitation).

cschwan commented 4 months ago

I've just merged this branch. The Python interface will be added in a separate PR.