Closed alecandido closed 2 years ago
@felixhekhorn can you test on your example?
Merging #182 (10d5ccd) into master (7851ebc) will increase coverage by
0.01%
. The diff coverage is77.77%
.
@@ Coverage Diff @@
## master #182 +/- ##
==========================================
+ Coverage 89.32% 89.33% +0.01%
==========================================
Files 47 47
Lines 6932 6930 -2
==========================================
- Hits 6192 6191 -1
+ Misses 740 739 -1
Flag | Coverage Δ | |
---|---|---|
python | 80.64% <ø> (ø) |
|
rust | 89.45% <77.77%> (+0.01%) |
:arrow_up: |
Flags with carried forward coverage won't be shown. Click here to find out more.
Impacted Files | Coverage Δ | |
---|---|---|
pineappl/src/grid.rs | 87.45% <77.77%> (+0.06%) |
:arrow_up: |
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.
With this branch I get
$ pineappl diff data/grids/208/ATLASWZRAP36PB-ATLAS-arXiv\:1109.5141-Z0_eta34.pineappl.lz4 data/fktables/208/ATLASWZRAP36PB-ATLAS-arXiv\:1109.5141-Z0_eta34.pineappl.lz4 NNPDF40_nlo_as_01180 --ignore-orders --ignore-lumis
LHAPDF 6.4.0 loading /home/felix/local/share/LHAPDF/NNPDF40_nlo_as_01180/NNPDF40_nlo_as_01180_0000.dat
NNPDF40_nlo_as_01180 PDF set, member #0, version 1; LHAPDF ID = 331700
b x1 diff
-+---+---+-----------+-----------+--------
0 0 0.4 1.3463245e5 1.3450066e5 9.798e-4
1 0.4 0.8 1.3347254e5 1.3334425e5 9.621e-4
2 0.8 1.2 1.3077692e5 1.3066370e5 8.665e-4
3 1.2 1.6 1.2699118e5 1.2689062e5 7.925e-4
4 1.6 2 1.2116815e5 1.2109076e5 6.391e-4
5 2 2.4 1.1276176e5 1.1269892e5 5.575e-4
6 2.4 2.8 9.8484125e4 9.8421191e4 6.394e-4
7 2.8 3.6 5.7843820e4 5.7813500e4 5.244e-4
which seems to catch the problem - we're back at permille accuracy!
I suggest, @andreab1997, you can recompute (whenever you have time, of course) the problematic grids in https://github.com/NNPDF/pineko/issues/62 and confirm or deny the conclusion
I also can partly confirm another comment of @andreab1997: this took ~2h to compute which he observed for the good commits as well (in contrast to "short runs" which resulted in wrong FK tables)
With this branch I get
$ pineappl diff data/grids/208/ATLASWZRAP36PB-ATLAS-arXiv\:1109.5141-Z0_eta34.pineappl.lz4 data/fktables/208/ATLASWZRAP36PB-ATLAS-arXiv\:1109.5141-Z0_eta34.pineappl.lz4 NNPDF40_nlo_as_01180 --ignore-orders --ignore-lumis LHAPDF 6.4.0 loading /home/felix/local/share/LHAPDF/NNPDF40_nlo_as_01180/NNPDF40_nlo_as_01180_0000.dat NNPDF40_nlo_as_01180 PDF set, member #0, version 1; LHAPDF ID = 331700 b x1 diff -+---+---+-----------+-----------+-------- 0 0 0.4 1.3463245e5 1.3450066e5 9.798e-4 1 0.4 0.8 1.3347254e5 1.3334425e5 9.621e-4 2 0.8 1.2 1.3077692e5 1.3066370e5 8.665e-4 3 1.2 1.6 1.2699118e5 1.2689062e5 7.925e-4 4 1.6 2 1.2116815e5 1.2109076e5 6.391e-4 5 2 2.4 1.1276176e5 1.1269892e5 5.575e-4 6 2.4 2.8 9.8484125e4 9.8421191e4 6.394e-4 7 2.8 3.6 5.7843820e4 5.7813500e4 5.244e-4
which seems to catch the problem - we're back at permille accuracy!
I suggest, @andreab1997, you can recompute (whenever you have time, of course) the problematic grids in NNPDF/pineko#62 and confirm or deny the conclusion
I also can partly confirm another comment of @andreab1997: this took ~2h to compute which he observed for the good commits as well (in contrast to "short runs" which resulted in wrong FK tables)
Ok very good, thank you very much for this. As soon as I have time I will run a lot of tests to really be sure, but I would say that most likely we found the issue.
I also can partly confirm another comment of @andreab1997: this took ~2h to compute which he observed for the good commits as well (in contrast to "short runs" which resulted in wrong FK tables)
@cschwan can we maybe consider an optimization for the grid, in which we reinterpolate all the subgrids on a common xgrid?
For sure this will bring some loss in accuracy, but might be an acceptable one (against a considerable speed-up).
@andreab1997 @felixhekhorn as a guiding principle: all grids converted from APPLgrids are potentially affected by this bug, grids generated from pinefarm
(with yadism or Madgraph5@MC_NLO) should be OK.
I also can partly confirm another comment of @andreab1997: this took ~2h to compute which he observed for the good commits as well (in contrast to "short runs" which resulted in wrong FK tables)
@cschwan can we maybe consider an optimization for the grid, in which we reinterpolate all the subgrids on a common xgrid?
For sure this will bring some loss in accuracy, but might be an acceptable one (against a considerable speed-up).
As far as I understand this won't help (yet), Grid::convolute_eko
is simply inherently slow. However, I've written a realisitic unit test and with that I'll be able to improve the performance. Monday is a holiday in Germany and by then I hopefully have something faster!
As far as I understand this won't help (yet),
Grid::convolute_eko
is simply inherently slow.
The idea would be to trash this PR and rely on the assumptions that they are all the same. According to @felixhekhorn
this took ~2h to compute which he observed for the good commits as well (in contrast to "short runs" which resulted in wrong FK tables)
so this bug fix is heavily affecting performances. If we can live without, so much the better.
As far as I understand this won't help (yet),
Grid::convolute_eko
is simply inherently slow.The idea would be to trash this PR and rely on the assumptions that they are all the same. According to @felixhekhorn
This I don't understand: why wouldn't we want to merge this pull request?
this took ~2h to compute which he observed for the good commits as well (in contrast to "short runs" which resulted in wrong FK tables)
so this bug fix is heavily affecting performances. If we can live without, so much the better.
I can always write a program that calculates the wrong results very quickly, but that isn't very helpful, of course.
As far as I understand this won't help (yet),
Grid::convolute_eko
is simply inherently slow.The idea would be to trash this PR and rely on the assumptions that they are all the same. According to @felixhekhorn
hm? if they are coming from the outside (meaning APPL) there is not much we can do, can we? reinterpolate is an option, but not the short term answer
With this branch I get
$ pineappl diff data/grids/208/ATLASWZRAP36PB-ATLAS-arXiv\:1109.5141-Z0_eta34.pineappl.lz4 data/fktables/208/ATLASWZRAP36PB-ATLAS-arXiv\:1109.5141-Z0_eta34.pineappl.lz4 NNPDF40_nlo_as_01180 --ignore-orders --ignore-lumis LHAPDF 6.4.0 loading /home/felix/local/share/LHAPDF/NNPDF40_nlo_as_01180/NNPDF40_nlo_as_01180_0000.dat NNPDF40_nlo_as_01180 PDF set, member #0, version 1; LHAPDF ID = 331700 b x1 diff -+---+---+-----------+-----------+-------- 0 0 0.4 1.3463245e5 1.3450066e5 9.798e-4 1 0.4 0.8 1.3347254e5 1.3334425e5 9.621e-4 2 0.8 1.2 1.3077692e5 1.3066370e5 8.665e-4 3 1.2 1.6 1.2699118e5 1.2689062e5 7.925e-4 4 1.6 2 1.2116815e5 1.2109076e5 6.391e-4 5 2 2.4 1.1276176e5 1.1269892e5 5.575e-4 6 2.4 2.8 9.8484125e4 9.8421191e4 6.394e-4 7 2.8 3.6 5.7843820e4 5.7813500e4 5.244e-4
which seems to catch the problem - we're back at permille accuracy!
I suggest, @andreab1997, you can recompute (whenever you have time, of course) the problematic grids in NNPDF/pineko#62 and confirm or deny the conclusion
I also can partly confirm another comment of @andreab1997: this took ~2h to compute which he observed for the good commits as well (in contrast to "short runs" which resulted in wrong FK tables)
I am computing now all the fktables that are listed in "Things to check" in https://github.com/NNPDF/pineko/issues/62 with this branch and I am ticking its elements one by one as I check them. For the moment the three that I have computed are fine so I believe also the others will be fine as well. Thank you again!
This I don't understand: why wouldn't we want to merge this pull request? I can always write a program that calculates the wrong results very quickly, but that isn't very helpful, of course.
Of course, we want correct results, but we want also to be fast. That's why I proposed reinterpolation: if we can transform subgrids, such that they all have a common x, we will loose a bit in precision, but for a great gain in performances.
fwiw the more I think about it the more I am convinced that what we want at the fit level is a common x grid. The fitting problem becomes a lot easier to reason about then.
Measuring how fine the grid needs to be is relatively easy.
fwiw the more I think about it the more I am convinced that what we want at the fit level is a common x grid. The fitting problem becomes a lot easier to reason about then.
I believe we all agree, that is definitely a good thing. But it is about the xgrid in the FkTable, that is coming from the EKO output, and it is already guaranteed.
Here we are talking of the EKO input (eko
is already reinterpolating, so allows for input xgrid different from output), that has to be matched with the equivalent of an APPLgrid.
Unfortunately, it seems that not all the old APPLgrids where using a single xgrid for all the subgrids...
By the way @cschwan, I believe that the current status (this PR) might also be wrong: if subgrid A has a different xgrid than subgrid B, in order to use a single EKO we have to compute it on the union (since we are not reinterpolating it on the fly). But then the subgrid is convoluted only over a subset of the EKO, and this should be wrong: even if you have a subset of the points in the extended one, the interpolation basis is not a subset of the basis, because the support of the polynomials in the smaller basis has to be larger than for the corresponding in the one related to the union. Thus I believe we need reinterpolation, and also what we were doing before might have been wrong (and now I doubt also what was doing APFELcomb on those grids...)
@AleCandido I think what you're probably doing is to calculate the EKO for each bin (the x grids don't change over luminosities and there's only one order) and then merge all EKOs. I think in this case there shouldn't be a problem, since most likely the x grids for each bin are all pairwise disjoint sets. If this isn't the case, merging EKOs for different bins should fail. In other words there's probably no problem.
@AleCandido I think what you're probably doing is to calculate the EKO for each bin (the x grids don't change over luminosities and there's only one order) and then merge all EKOs. I think in this case there shouldn't be a problem, since most likely the x grids for each bin are all pairwise disjoint sets. If they aren't the merging of the EKOs for each bin should be going wrong.
Don't think so, since we rely on grid information to construct the EKO, in particular on GridAxes
, that
xgrid
:
https://github.com/N3PDF/pineappl/blob/ad22780918c4452cca0c4b6ddded6a600237f143/pineappl/src/grid.rs#L355-L357So, given a grid, PineAPPL is always returning us a single xgrid, not a bin-dependent one. The only thing we can do at this point, it is to compute one single EKO for the single xgrid given.
And here is the proof that we are computing the EKO for the union xgrid: https://github.com/N3PDF/pineappl/blob/ad22780918c4452cca0c4b6ddded6a600237f143/pineappl/src/grid.rs#L1369-L1378 (and not the union of the EKOs for each individual xgrid).
The key is, of course, x_grid.extend_from_slice()
.
Thus I believe we need reinterpolation, and also what we were doing before might have been wrong (and now I doubt also what was doing APFELcomb on those grids...)
I don't believe this is a problem, because we're already at the level of matrix multiplication (after the grid), so the interpolation only plays a role inside the grid. Afterwards, it is just linear algebra: meaning eko is exact on the grid points (by construction) and so the information from the input to any output point is exact
Let's say you have a grid with two bins for DIS. The first bin has this x grid:
[0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8]
and the second bin:
[0.25, 0.35, 0.45, 0.55, 0.65, 0.75, 0.85]
which means you calculate the EKO for the union of the x grid points given above. For the first bin that means the evolution is done using the a subset (the first set), while the other x grid points contribute with zero. I don't see anything wrong with that; the interpolation should still work.
I don't believe this is a problem, because we're already at the level of matrix multiplication (after the grid), so the interpolation only plays a role inside the grid. Afterwards, it is just linear algebra: meaning eko is exact on the grid points (by construction) and so the information from the input to any output point is exact
One of the two is exact, this is true: matrix multiplication is always a contraction over two paired, but different, indices:
This only works if the index over the EKO is of kind 1., can you convince me of this? @felixhekhorn
you calculate the EKO for the union of the x grid points given above. For the first bin that means the evolution is done using the a subset (the first set), while the other x grid points contribute with zero
@cschwan the problem is there for one of the EKO indices (input or output, it is always confusing to me). On one of the two, the EKO is the coefficient over one basis of functions: if you compute the coefficients of a given function over an extended basis, but then you only use a subset, you are effectively trashing the others, so you are telling that they should be multiplied by zeros. But that is not true: if the yadism function is evaluated at a limited subset of points, it doesn't mean that is zero elsewhere.
@AleCandido if we use data/grids/208/ATLASWZRAP36PB-ATLAS-arXiv:1109.5141-Z0_eta34.pineappl.lz4
, delete everything but, say, the first bin, and then evolve the grid with one bin using
we should be able to see what happens and how big the difference is.
With this branch I get
$ pineappl diff data/grids/208/ATLASWZRAP36PB-ATLAS-arXiv\:1109.5141-Z0_eta34.pineappl.lz4 data/fktables/208/ATLASWZRAP36PB-ATLAS-arXiv\:1109.5141-Z0_eta34.pineappl.lz4 NNPDF40_nlo_as_01180 --ignore-orders --ignore-lumis LHAPDF 6.4.0 loading /home/felix/local/share/LHAPDF/NNPDF40_nlo_as_01180/NNPDF40_nlo_as_01180_0000.dat NNPDF40_nlo_as_01180 PDF set, member #0, version 1; LHAPDF ID = 331700 b x1 diff -+---+---+-----------+-----------+-------- 0 0 0.4 1.3463245e5 1.3450066e5 9.798e-4 1 0.4 0.8 1.3347254e5 1.3334425e5 9.621e-4 2 0.8 1.2 1.3077692e5 1.3066370e5 8.665e-4 3 1.2 1.6 1.2699118e5 1.2689062e5 7.925e-4 4 1.6 2 1.2116815e5 1.2109076e5 6.391e-4 5 2 2.4 1.1276176e5 1.1269892e5 5.575e-4 6 2.4 2.8 9.8484125e4 9.8421191e4 6.394e-4 7 2.8 3.6 5.7843820e4 5.7813500e4 5.244e-4
which seems to catch the problem - we're back at permille accuracy! I suggest, @andreab1997, you can recompute (whenever you have time, of course) the problematic grids in NNPDF/pineko#62 and confirm or deny the conclusion I also can partly confirm another comment of @andreab1997: this took ~2h to compute which he observed for the good commits as well (in contrast to "short runs" which resulted in wrong FK tables)
I am computing now all the fktables that are listed in "Things to check" in NNPDF/pineko#62 with this branch and I am ticking its elements one by one as I check them. For the moment the three that I have computed are fine so I believe also the others will be fine as well. Thank you again!
Sorry to step into the discussion but I just wanted to add a detail. It seems that for the fktables in the list "Things with a conjectured solution" the conjectured solution was wrong because with this branch I got the correct result.
@andreab1997 so if I understand you then in every case you've seen so far the differences between the grids and the corresponding FK tables are small? That would be reassuring.
we should be able to see what happens and how big the difference is.
Let me think one moment, and let's wait for @felixhekhorn reply.
In any case, I don't believe the exercise to be extremely relevant: either it is correct, or it is wrong. Empirically correct is telling us little, because I know that if it is wrong I can make the result arbitrarily different, so we would rely on the specific xgrids chosen (and they are not chosen by us). Moreover, empirically correct definition is based on agreement with APFELcomb, but we may well have perfect agreement, and just being both wrong in the same way.
@AleCandido we had a similar discussion here: https://github.com/N3PDF/pineappl/issues/151.
@andreab1997 so if I understand you then in every case you've seen so far the differences between the grids and the corresponding FK tables are small? That would be reassuring.
Yes exactly, most of the times under the permille level. Sometimes the error is slightly above 1 permille but never more than that.
@AleCandido we had a similar discussion here: #151.
In that case, we were considering different grids. Now the problem is within a single grid. But I agree it is a similar problem.
In any case, I don't believe the exercise to be extremely relevant: either it is correct, or it is wrong. Empirically correct is telling us little, because I know that if it is wrong I can make the result arbitrarily different, so we would rely on the specific xgrids chosen (and they are not chosen by us). Moreover, empirically correct definition is based on agreement with APFELcomb, but we may well have perfect agreement, and just being both wrong in the same way.
If in all cases we've seen so far the differences are small then I believe - which is a rather weak statement, I admit - the interpolation can't be too far off and at the very least we probably don't have to hold off producing fits with the new pipeline.
In any case I agree we should thoroughly test it and being correct is paramount!
Since we are now pretty sure that this was the source of the bug, @cschwan are you going to release a new version? Or do you want to wait to improve performances first? For me it makes essentially no difference (although it is obviuosly simpler with a release).
@andreab1997 I'll probably release a new version at the end of today.
@AleCandido @felixhekhorn given that we fixed the initial problem, are you OK with merging this pull request?
@AleCandido @felixhekhorn given that we fixed the initial problem, are you OK with merging this pull request?
Yes, let's merge this one, and go back to the former status. We can keep discussing about the rest separately.
@andreab1997 I've just released v0.5.6
which includes this fix. Crates and packages should become available soon!
@andreab1997 I've just released
v0.5.6
which includes this fix. Crates and packages should become available soon!
Ok thank you very much!
I reverted https://github.com/N3PDF/pineappl/commit/b7014fdc4ecf9ed3ef15f08cc47ee5761ed0b471, as @cschwan suspect this to be the issue https://github.com/N3PDF/pineappl/pull/103#discussion_r984360664.