Closed cschwan closed 2 years ago
Base: 91.22% // Head: 91.38% // Increases project coverage by +0.15%
:tada:
Coverage data is based on head (
f841dff
) compared to base (5b1a3c0
). Patch coverage: 92.30% of modified lines in pull request are covered.
:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.
To give you a starting point,
is where the actual convolution happens, everything else is just bookkeeping. In line 2073 the values opa
, opb
are two-dimensional EKOs and arr
the two-dimensional grid. The operations dot
and t
perform a matrix-matrix multiplication and matrix transposition.
The amount of code is still massive (~300 lines), so can you try to split it a bit? e.g.
@felixhekhorn yes, that'll definitely happen. I will probably write a different method for DIS, because the core operation can be optimized differently as well.
@felixhekhorn yes, that'll definitely happen. I will probably write a different method for DIS, because the core operation can be optimized differently as well.
Maybe this is also the best option, but then I'd definitely complete double hadronic first.
The moment we have, we can check how many "units" you have, and try to split them in functions. Then, we may reuse the units to build a completely separate functions for DIS (and maybe this will improve, since we don't have to bloat one function to accommodate for both).
For splitting, I'd try to look for more sensible units, rather than just cut pieces in "pre", "during", "post". Also because you might end up passing everything you create in one function to another function. If possible, let's try to have sensible reusable blocs.
I would also consider splitting the whole business in a different file (evolve.rs
, or something like), since this is not related to the core lifecycle of a grid, but it is a further well-separate task.
@felixhekhorn yes, that'll definitely happen. I will probably write a different method for DIS, because the core operation can be optimized differently as well.
Maybe this is also the best option, but then I'd definitely complete double hadronic first.
Agreed.
The moment we have, we can check how many "units" you have, and try to split them in functions. Then, we may reuse the units to build a completely separate functions for DIS (and maybe this will improve, since we don't have to bloat one function to accommodate for both).
For splitting, I'd try to look for more sensible units, rather than just cut pieces in "pre", "during", "post". Also because you might end up passing everything you create in one function to another function. If possible, let's try to have sensible reusable blocs.
I would also consider splitting the whole business in a different file (
evolve.rs
, or something like), since this is not related to the core lifecycle of a grid, but it is a further well-separate task.
Yes that sounds like good idea, grid.rs
is already very big.
Commit 58e8321538a75303f658ac2084783a7ae7260939 fixes the bug I mentioned above and now I get the same results as produced by Grid::convolute_eko
. It's strange that the results do not agree with ones produced before evolution, but this must be mistake in the choice of parameters or even in the EKO itself (maybe I've copied the wrong one)? Could someone of you check?
In any case the numbers of Grid::convolute_eko
and Grid::evolve
are numerically the same and the speedup is promising: Grid::convolute_eko
takes 80 seconds, Grid::evolve
is less than 0.5 seconds!
Wow! that sounds almost to good to be true
This process might be a bit of a special case, because it has very few initial-states; but on the other hand here are some good reasons that I long suspected:
Grid::convolute_eko
.ndarray
does for us; this avoids any kind of index-checking which is probably expensive in deeply-nested loops. We could easily switch on BLAS support in ndarray
that could make it even faster without changing the code!Grid::evolve
, which does require some not-so-easy chains of iterators.@AleCandido @felixhekhorn There's another issue moving forward with this implementation: in order to keep the size of the operators small EKO produces a muf2-dependent x grid (the x in the Grid
, not in the FkTable). I don't remember all the details, but I think a potential problem will be that the EKO is sorted according to the ordering of Grid::axes
and that this is possibly not reconstructible from metadata.yaml
alone (without the grid). It's difficult to describe this here, would you mind discussing this live?
With commit adc00477d5c034ed5e4117ca070caa5f7b59a1e0 I can now successfully evolve ATLASWZRAP36PB
in less than 2 seconds! The function Grid::convolute_eko
needs roughly 45 minutes for it.
I believe functionality-wise Grid::evolve
is complete except DIS support. Once I'll have access to It seems dom
again I'll do more testing.dom
is back.
The biggest EKO that I could find is for ATLAS_WM_JET_8TEV_PT-atlas-atlas-wjets-arxiv-1711.03296-xsec003
, and here it took 88 seconds.
I profiled Grid::evolve
with the big EKO and here are the results: 94.40% is spent in Grid::evolve
(the rest is the unit test loading the EKO, convolutions, etc.). Out of those 94.40% we have
dot
(matrix-matrix multiplications)Grid::operators
of which
Grid::as_standard_layout
andGrid::select
If I interpret the results correctly we could speed up the evolution by 25% (for Grid::as_standard_layout
) if we ordered the 5-dimensional EKOs differently, but I suppose it's not worth making this - very likely breaking - change in EKO/pineko.
Grid::select
is used to select the subset of x
-grid values in the EKO for the current subgrid, so this is unavoidable.
I think overall this looks pretty good and I don't think we need to show any progress bar given that you don't have to wait longer than 2 minutes. We should keep an eye on the grids/EKOs that take longest and eventually perform more optimization if they're needed.
I'll proceed with DIS.
If I interpret the results correctly we could speed up the evolution by 25% (for
Grid::as_standard_layout
) if we ordered the 5-dimensional EKOs differently, but I suppose it's not worth making this - very likely breaking - change in EKO/pineko.
This I'm not sure: it is breaking, but the layout is not final (we are going to make a big change in a while) But the Q2 dimension is always convenient to be the most external one, also because you might want to load only a subset of them.
We can also consider to cache them.
Grid::select
is used to select the subset ofx
-grid values in the EKO for the current subgrid, so this is unavoidable.
This I believe we can improve at the level of the grid: if we produce grids with a common xgrid, that operation should not be required.
Maybe we can add a toggle somewhere, promising that the grid has a unique xgrid everywhere, and skip the Grid::select
if the toggle is detected.
I think overall this looks pretty good and I don't think we need to show any progress bar given that you don't have to wait longer than 2 minutes. We should keep an eye on the grids/EKOs that take longest and eventually perform more optimization if they're needed.
Maybe we can also make the progress bar optional: if it takes longer might be useful. Otherwise, we can keep for debug, and conditionally remove it at compile time.
This I'm not sure: it is breaking, but the layout is not final (we are going to make a big change in a while) But the Q2 dimension is always convenient to be the most external one, also because you might want to load only a subset of them.
Okay, if you say the layout is not final my proposal would be to change it from
op[mu2][pid1][x1][pid0][x0]
to
op[mu2][pid0][x0][pid1][x1]
so just interchanging the 0
indices with the 1
indices. The difficulty that I foresee is that you probably have to make some adjustments to the EKO file format to support this change.
so just interchanging the
0
indices with the1
indices. The difficulty that I foresee is that you probably have to make some adjustments to the EKO file format to support this change.
The EKO file format is exactly the moving part, but the storage is just a folder with one array file for each 4 dimensional array object relative to each $\mu^2$ value.
Now, I actually see the room for some generalization: it makes sense that is convenient here to exchange 0 with 1, because we want to evolve a grid. But usually we evolve PDFs.
So, my proposal is the following: I would allow for actually two layouts, the pdf
and the grid
layout (that are just the 0 and 1 interchanged). We will provide methods to convert them (they are very simple), and possibly to compute directly in the favorite layout, in order to minimize the computing effort (but this we can also do at a later stage).
Do you believe this to be a sensible option? @felixhekhorn @cschwan
@AleCandido that sounds great. I think in the meantime (before merging this pull request) we could simply assume that the operator
passed to Grid::evolve
is in this format, and I'll check whether this change indeed improves the performance. If it doesn't, there's no need to change anything, of course.
With commit e7d162e974b955728d5dc1e22f24f93274186e5f the evolution for DIS grid should work, and while testing a found an unrelated bug that's been fixed in commit 6df9bf199c955e0d10a59425d4c4382655e3b9fd.
Note to myself: Grid::evolve
doesn't implement the 'gluon exception', which treats pid = 0
in a Grid
as the gluon.
Mmm ... something is not working ... I had to recompute the eko (for some reason I don't know), so that's seems unlikely
$ pineappl diff data/grids/400/NNPDF_POS_ANTI_UP_40.pineappl.lz4 data/fktables/400/NNPDF_POS_ANTI_UP_40.pineappl.lz4 NNPDF40_nnlo_as_01180 --ignore-orders --ignore-lumis
LHAPDF 6.4.0 loading /home/felix/local/share/LHAPDF/NNPDF40_nnlo_as_01180/NNPDF40_nnlo_as_01180_0000.dat
NNPDF40_nnlo_as_01180 PDF set, member #0, version 1; LHAPDF ID = 331100
b x1 x2 diff
--+-+-+------------------------+------------------------+------------+------------+---------
0 5 5 0.0000005 0.0000005 1.2405356e0 1.2207843e0 1.618e-2
1 5 5 0.0000019407667236782136 0.0000019407667236782136 1.0556043e0 1.0450481e0 1.010e-2
2 5 5 0.000007533150951473337 0.000007533150951473337 8.9828945e-1 8.9295580e-1 5.973e-3
3 5 5 0.000029240177382128657 0.000029240177382128657 7.6465325e-1 7.6168863e-1 3.892e-3
4 5 5 0.00011349672651536727 0.00011349672651536727 6.4976452e-1 6.4739982e-1 3.653e-3
5 5 5 0.00044054134013486355 0.00044054134013486355 5.4556837e-1 5.4296734e-1 4.790e-3
6 5 5 0.0017099759466766963 0.0017099759466766963 4.4285618e-1 4.3971407e-1 7.146e-3
7 5 5 0.006637328831200572 0.006637328831200572 3.4236858e-1 3.3883989e-1 1.041e-2
8 5 5 0.02576301385940815 0.02576301385940815 2.3851112e-1 2.3603616e-1 1.049e-2
9 5 5 0.1 0.1 1.0826740e-1 1.0965325e-1 -1.264e-2
10 5 5 0.18 0.18 5.3312426e-2 5.6506503e-2 -5.653e-2
11 5 5 0.26 0.26 2.2300404e-2 2.5950819e-2 -1.407e-1
12 5 5 0.33999999999999997 0.33999999999999997 7.6893571e-3 1.0936975e-2 -2.969e-1
13 5 5 0.42000000000000004 0.42000000000000004 3.0482582e-3 5.5009559e-3 -4.459e-1
14 5 5 0.5 0.5 1.7784672e-3 3.4023716e-3 -4.773e-1
15 5 5 0.58 0.58 1.6533254e-3 2.5976596e-3 -3.635e-1
16 5 5 0.66 0.66 1.8898310e-3 2.3339114e-3 -1.903e-1
17 5 5 0.74 0.74 1.4804651e-3 1.6284870e-3 -9.090e-2
18 5 5 0.82 0.82 6.6492704e-4 6.9486868e-4 -4.309e-2
19 5 5 0.9 0.9 1.4006028e-4 1.4172972e-4 -1.178e-2
@felixhekhorn evolve_with_one
still has at least one bug, also HERA_CC_318GEV_EP_SIGMARED
CHORUS_CC_NB_PB_SIGMARED
is only partially (strangely only some bins are wrong) working.
How did you produce the I suppose you used https://github.com/NNPDF/pineko/pull/49. Where can I get the EKO and the grid @felixhekhorn?FkTable
? I'd like to see your call to Grid::evolve
.
@felixhekhorn I can reproduce your results, and I'm working on fixing this bug.
Commit 58e8321 fixes the bug I mentioned above and now I get the same results as produced by
Grid::convolute_eko
. It's strange that the results do not agree with ones produced before evolution, but this must be mistake in the choice of parameters or even in the EKO itself (maybe I've copied the wrong one)? Could someone of you check?
I don't know what happened or what was wrong but this is no longer a problem (neither with Grid::convolute_eko
nor with Grid::evolve
) as of commit 5fbd46b6e3ca8ed5e81736211481b96bd1bc3567.
@felixhekhorn For NNPDF_POS_ANTI_UP_40
I get same numbers no matter whether I use Grid::evolve
or Grid::convolute_eko
. Is it possible that the EKO is incompatible with the PDF/grid?
Commit 9d9d40a5f3fec8f6b64c77e6abff62804aab002a improves the performance of the evolution (measured on my very old laptop) further
ATLASPH15
/ATLAS_WM_JET_8TEV_PT
, respectively.perf
shows that 68% of the time in Grid::evolve
is spent on dot
, which is very good news.evolution::operators
, mostly in select
.as_standard_layout
, suggesting that permuting the EKO dimensions as discussed above will not significantly improve performance, so I propose not to make any changes to its layout.Now I think the code is in very good shape, at least performance-wise. What still needs to be done is to thoroughly check it, ideally with every grid and EKO available so there are no surprises later on, and pick some representative examples for unit tests to increase code coverage.
- from 84s/58s to 27s/26s for
ATLASPH15
/ATLAS_WM_JET_8TEV_PT
, respectively.- Moreover,
perf
shows that 68% of the time inGrid::evolve
is spent ondot
, which is very good news.
That is incredibly promising, and given that PineAPPL is now the backbone of the pipeline is really important. Thank you @cschwan for all the work here, we will try to do our part in EKO.
- Only 4% in total (part of the 18%) is now spent in
as_standard_layout
, suggesting that permuting the EKO dimensions as discussed above will not significantly improve performance, so I propose not to make any changes to its layout.
We don't have much spare time, so I definitely agree (we can keep for later, but everything that is not a bottleneck is not a priority).
Now I think the code is in very good shape, at least performance-wise. What still needs to be done is to thoroughly check it, ideally with every grid and EKO available so there are no surprises later on, and pick some representative examples for unit tests to increase code coverage.
Good, and in particular, having split many subfunctions now we can really unit test them. But yes, definitely we want also representative examples to detect possible regressions now and in the future.
With pineappl evolve --accuracy=2e-3 ...
I tested theory 208 and with a single core on dom
it takes 9min33s to perform all evolutions. That's the good news. However, there are a couple of grids that fail the check:
CHORUS_CC_NB_PB_SIGMARED
CHORUS_CC_NU_PB_SIGMARED
E886P
HERA_NC_225GEV_EP_SIGMARED
HERA_NC_251GEV_EP_SIGMARED
HERA_NC_300GEV_EP_SIGMARED
HERA_NC_318GEV_EAVG_SIGMARED_BOTTOM
HERA_NC_318GEV_EAVG_SIGMARED_CHARM
HERA_NC_318GEV_EP_SIGMARED
NMC_NC_EM_D_F2
NMC_NC_EM_P_F2
NMC_NC_EM_P_SIGMARED
NUTEV_CC_NB_FE_SIGMARED
NUTEV_CC_NU_FE_SIGMARED
SLAC_NC_EM_D_F2
SLAC_NC_EM_P_F2
For some I know that Grid::convolute_eko
yields the same results. I'm using NNPDF40_nlo_as_01180
to perform the convolutions. Is there any chance the EKOs are wrong?
The following are wrong because of the gluon exception (https://github.com/NNPDF/pineappl/pull/184#issuecomment-1277656560):
LHCB_DY_13TEV_DIELECTRON
LHCB_DY_13TEV_DIMUON
LHCB_DY_7TEV_940PB
LHCB_DY_7TEV
LHCB_DY_8TEV_2FB
LHCB_DY_8TEV
Here are the logs, please have a look: test-grid-evolve.txt
@andreab1997 you can have a look at the workflow at .github/workflows/rust.yaml
.
What @cschwan has done from f91e7ca to 8d58423, is more or less what I was proposing to do in pineko:
.git
repo)I believe we can mirror this on that side :)
The only improvement I would do, is to add to the repository a shell script for the download (containing what now is in the workflow download step), in such a way that is simple to reproduce the workflow environment locally (without the need of copy-pasting from the workflow).
The following are wrong because of the gluon exception (https://github.com/NNPDF/pineappl/pull/184#issuecomment-1277656560):
I guess that would need to be fixed before merging ...
as for the DIS grids listed above: it is not sufficient to just look at the difference between grid and FK table, since the observables also span non-physical range. More precisely they include points below $Q_0^2$ where we expect differences: first LHAPDF breaks down (backward evolution) and then pQCD. So we need the bin information next to the numbers - @cschwan can you produce such a list? (both pineappl diff
and pineko compare
should be able to do so) (or else let me know where your grids/FK tables are ...)
That being said, I can do the JOIN also by hand and most of them are indeed alright (so matching where they should and differences where we expect them). However, looking at NUTEV_CC_NB_FE_SIGMARED
I can see also something strange as only the first few bins are off, but they should correspond to an increasing series of Q2 points and so I wonder whether the old bug about not-preserving the observable order got visible again? so question is: which versions did you use @cschwan ?
>>> NUTEV_CC_NB_FE_SIGMARED.tar b FkTable Grid rel. diff --+-----------+-----------+------------- 0 3.6074998e0 5.8221893e0 6.1391255e-1 1 7.7797554e0 9.9638287e0 2.8073804e-1 2 1.1450347e1 1.3461273e1 1.7562146e-1 3 9.7389131e0 1.0754171e1 1.0424760e-1 4 8.8859954e0 8.2236821e0 -7.4534507e-2 5 2.0471413e1 1.9412808e1 -5.1711398e-2 6 1.7321241e1 1.6906333e1 -2.3953712e-2 7 1.7655144e1 1.7655176e1 1.8078417e-6 8 1.0827803e1 1.0827640e1 -1.5123138e-5 9 7.1845983e0 7.1846006e0 3.2141418e-7 10 2.3103035e1 2.3102408e1 -2.7160240e-5 11 1.3527970e1 1.3527618e1 -2.6071438e-5
observables: XSNUTEVCC_charm: - Q2: 0.7648485767999998 x: 0.015 y: 0.349 - Q2: 2.1415760150399996 x: 0.042 y: 0.349 - Q2: 3.671273168639999 x: 0.072 y: 0.349 - Q2: 5.812849183679999 x: 0.114 y: 0.349 - Q2: 10.554910359839997 x: 0.207 y: 0.349 - Q2: 1.2689035127999997 x: 0.015 y: 0.579 - Q2: 3.5529298358399997 x: 0.042 y: 0.579 - Q2: 6.090736861439998 x: 0.072 y: 0.579 - Q2: 9.643666697279999 x: 0.114 y: 0.579 - Q2: 17.510868476639995 x: 0.207 y: 0.579 - Q2: 1.7006375232 x: 0.015 y: 0.776 - Q2: 4.76178506496 x: 0.042 y: 0.776 - Q2: 8.163060111359998 x: 0.072 y: 0.776 - Q2: 12.92484517632 x: 0.114 y: 0.776 - Q2: 23.468797820159995 x: 0.207 y: 0.776 - Q2: 1.4116504163999999 x: 0.015 y: 0.349 - Q2: 3.95262116592 x: 0.042 y: 0.349
I guess that would need to be fixed before merging ...
This is fixed in commits 7b6fdb2100e1bd240d2401e3118fc3f58bce156e and 8c4274ed74b997137f48843ad08689bfac868baf.
as for the DIS grids listed above: it is not sufficient to just look at the difference between grid and FK table, since the observables also span non-physical range. More precisely they include points below Q02 where we expect differences: first LHAPDF breaks down (backward evolution) and then pQCD. So we need the bin information next to the numbers - @cschwan can you produce such a list? (both
pineappl diff
andpineko compare
should be able to do so) (or else let me know where your grids/FK tables are ...)
That makes a lot of sense. For the bin limits see test-output.txt. It seems that every bin with Q2 < 2.62892
is bad, which is exactly what you described (backward evolution). So it seems everything's OK!
That being said, I can do the JOIN also by hand and most of them are indeed alright (so matching where they should and differences where we expect them). However, looking at
NUTEV_CC_NB_FE_SIGMARED
I can see also something strange as only the first few bins are off, but they should correspond to an increasing series of Q2 points and so I wonder whether the old bug about not-preserving the observable order got visible again? so question is: which versions did you use @cschwan ?
I'm not sure which version you'd like to know, I simply copied the files from theory 208.
@andreab1997 @AleCandido @felixhekhorn I consider this done and ready for merging into master
. After merging I'll release a new version of PineAPPL so you can use Grid::evolve
in pineko.
We've got integration tests and I tested theory 208 completely. Do you think there's anything missing?
What's clearly missing is that we don't test for evolved and scale-varied FkTable
s. In commit 1d45b833cb40d098ab424d3d73e3f7e33be37772 I added two parameters that allow for scale variation, and this shows that the code is very likely wrong.
I merged this branch into master. We'll need further tests, but for them I'm going to create a new Issue.
We will need a release, and maybe also a deprecation plan for Grid::convolute_eko
I'll remove Grid::convolute_eko
with v0.6.0
. For a release I'd like to test scale-varied FkTable
s, this can still be wrong.
Ok, fine. But I'd like ot have another intermediate release before v0.6.0, because we need to upgrade in Pineko, and it would be handy to be able to test also there with both available.
I agree, I'm working on it right now!
@andreab1997 @AleCandido @felixhekhorn
This pull request adds the method
Grid::evolve
, which is going to replaceGrid::convolute_eko
. The new method should be primarily faster, but (in the end) also easier to read and to maintain. It should also avoid usingunwrap
s and index access in general.I also replaced
EkoInfo
withOperatorInfo
(name is up for debate), in which I removed the use ofGridAxes
and mainly change the naming of the variables. Instead of naming some objects with 'target' and 'source', which is terminology that's used differently in EKO and inGrid::convolute_eko
I instead use0
for objects defined in theFkTable
/at fitting scale and1
for objects defined inGrid
/at process scale(s).There's also a test added, which will not work since I didn't want to upload the EKO and the grid. If you'd like to test simply place the grid
LHCB_WP_8TEV.pineappl.lz4
and the EKO into PineAPPL's main directory. You must untar the operator and also uncompressalphas.npy
andoperators.npy
.Right now there are a few limitations which are documented in TODOs. I will remove them soon, of course:
If you have comments I'm happy to hear them!