Closed felixhekhorn closed 1 year ago
Merging #178 (4c860c2) into master (7e68241) will decrease coverage by
0.66%
. The diff coverage isn/a
.
@@ Coverage Diff @@
## master #178 +/- ##
==========================================
- Coverage 89.74% 89.08% -0.67%
==========================================
Files 47 47
Lines 7432 6935 -497
==========================================
- Hits 6670 6178 -492
+ Misses 762 757 -5
Flag | Coverage Δ | |
---|---|---|
python | 80.64% <ø> (ø) |
|
rust | 89.19% <ø> (-0.67%) |
:arrow_down: |
Flags with carried forward coverage won't be shown. Click here to find out more.
Impacted Files | Coverage Δ | |
---|---|---|
pineappl_cli/src/optimize.rs | 60.00% <0.00%> (-14.20%) |
:arrow_down: |
pineappl_cli/src/upgrade.rs | 76.92% <0.00%> (-10.58%) |
:arrow_down: |
pineappl_cli/src/sum.rs | 77.77% <0.00%> (-9.32%) |
:arrow_down: |
pineappl_cli/src/delete.rs | 84.21% <0.00%> (-9.13%) |
:arrow_down: |
pineappl_fastnlo/src/lib.rs | 80.00% <0.00%> (-8.53%) |
:arrow_down: |
pineappl_cli/src/ops.rs | 84.61% <0.00%> (-6.40%) |
:arrow_down: |
pineappl_cli/src/info.rs | 85.29% <0.00%> (-3.84%) |
:arrow_down: |
pineappl_cli/src/subgrids.rs | 85.84% <0.00%> (-3.71%) |
:arrow_down: |
pineappl_cli/src/merge.rs | 92.85% <0.00%> (-3.44%) |
:arrow_down: |
pineappl_cli/src/obl.rs | 92.00% <0.00%> (-1.94%) |
:arrow_down: |
... and 12 more |
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.
Actually, @cschwan there is something I don't understand: how can pineappl_grid_fill
be used to fill multi-dimensional grids? because my generator computes structure functions and so will be by default two dimensional (x, Q2) - and I can make it easily higher dimensional (say + heavy quark rapidity)
- heavy quark rapidity
Bins (the second dimension of the grid, according to obl
sorting) are actually a multi-index. So, you can stuff in there how many dimensions you wish, for as long as they correspond to external (measured) kinematics.
@felixhekhorn basically what @AleCandido said: for n-dimensional distributions you need to map them to 1-dimenional one using essentially bins, and after having filled the grid you can remap, using pineappl_grid_set_remapper
, the 1-dimensional distribution again to your original n-dimensional one.
I decided to do that because you'd have to have different Grid::fill
functions depending on how many dimensions you have, but then you could choose the wrong one and what do you do then? I'm not convinced this is the best approach, but for time being that's what we have. Another deciding factor was that most Monte Carlos only support one-dimensional distributions, and so you have to map to bins anyways.
Opening/closing: that was me having butterfingers :smile: .
I'm not convinced this is the best approach, but for time being that's what we have.
In a sense is the common one: even Numpy and other array libraries they always store one dimensional data, just because the memory model is linear. Usually, multi-dimensionality is achieved in two ways:
Maybe the bins machinery can be improved, and maybe we can think about a better way to maintain coordinates, possibly leveraging an external array crate like ndarray
.
But for the grid itself you wanted to manage on your own, because it is already a complicate data structure. The only other option I see is to populate an NDArray
of "partial grids" (i.e. order, lumi, and subgrids) one for each bin.
you need to map them to 1-dimenional one using essentially bins, and after having filled the grid you can remap,
okay, I see - and I understand the technical limitations ... but then I wonder whether I should provide some sugar here on this level ... in any case I will first complete the example, then attempt my implementation and if I feel I need more, do more
[..] and it can then be used in the future)
In the sense of pinefarm
?
In the sense of
pinefarm
?
if there would an interest (EIC, maybe? the HERA stuff is mostly already in as combined), why not ... (and keep in mind for the fully inclusive case yadism is already using my code under the hood) otherwise mostly for me (and my collaborators) and my studies ...
The current implementation reproduces one-to-one the C counter part (now moved into examples/c
).
Why I believe this is a good idea:
delete
in the example file, but this is done once and forever in the wrapperpineappl_grid_set_key_value
-> PineAPPL::Grid::set_key_value
("Namespaces are one honking great idea -- let's do more of those!")Order
as a structThings that could be improved:
pineappl_grid_set_remapper
)std::list
and std::vector
in parallel - I usually prefer list since this is the "simplest implementation" as it just requires pointers, however sometimes vector is also nice too have, e.g. for bins for which at
is quite nice to have; should I do something about this?
- the interface is not complete, but just makes the example back working - if I'm allowed I'd rather do the completion upon request at a later time; as said the second step is to actually use it and doing so, I'll discover new functions that are needed (e.g.
pineappl_grid_set_remapper
)
This is exactly what we are doing for the Python wrapper. If it's working there, can work even here :)
- I use
std::list
andstd::vector
in parallel - I usually prefer list since this is the "simplest implementation" as it just requires pointers, however sometimes vector is also nice too have, e.g. for bins for whichat
is quite nice to have; should I do something about this?
This I would not do: according to Stroustroup (I read it in A tour of C++, that should actual be just a part of The C++ Programming Language) always use a std::vector
unless you have a very good reason not to do it.
E.g., if you want to implement a dequeue, it is much better to use a std::list
(most of the times at least). As well as if you have to insert many elements in the middle, and stuffs like that.
But if you are not 90-100% convinced that your object has to be something else, than use std::vector
, it is by far the most efficient.
In almost all cases std::vector<T>
is preferable because it lays out its elements subsequently in memory and so you can easily interface with C by using the data
member function.
In almost all cases
std::vector<T>
is preferable because it lays out its elements subsequently in memory and so you can easily interface with C by using thedata
member function.
that's why I used vector
some times :innocent: - however, everything moved to vector
in https://github.com/N3PDF/pineappl/pull/178/commits/16c5b7f193df4dd46cd49e473a541f4850073cb3
In almost all cases
std::vector<T>
is preferable because it lays out its elements subsequently in memory and so you can easily interface with C by using thedata
member function.
That might be one reason, but the most quoted one is because of memory management and operations: in practice std::vector
has the best heuristic to limit the amount of allocations, and it has O(1) access performance.
(but for example it is rather worse for insertion)
Ah sorry, I also compiled and ran, and I can get a grid. The only weird thing is that I get practically the same output for c and cpp example, but not the same grid: they differ as binaries (a bit unexpected, but mostly uninformative) and I can not compare them with the CLI, because of the following reason:
Error: bins limits differ
Ah sorry, I also compiled and ran, and I can get a grid. The only weird thing is that I get practically the same output for c and cpp example, but not the same grid: they differ as binaries (a bit unexpected, but mostly uninformative) and I can not compare them with the CLI, because of the following reason:
Error: bins limits differ
this is due to float arithmetics - I use an iterator to compute the bins, Christopher a fixed list:
$ pineappl diff c/DY-LO-AA.pineappl.lz4 cpp/DY-LO-AA.pineappl.lz4 NNPDF31_nlo_as_0118_luxqed --ignore-bin-limits
LHAPDF 6.4.0 loading /home/felix/local/share/LHAPDF/NNPDF31_nlo_as_0118_luxqed/NNPDF31_nlo_as_0118_luxqed_0000.dat
NNPDF31_nlo_as_0118_luxqed PDF set, member #0, version 2; LHAPDF ID = 324900
b x1 O(as^0 a^2)
--+---+---+------------+------------+----------
0 0 0.1 5.2631092e-1 5.2631092e-1 0.000e0
1 0.1 0.2 5.2549083e-1 5.2549083e-1 0.000e0
2 0.2 0.3 5.2468241e-1 5.2468241e-1 6.661e-16
3 0.3 0.4 5.1883402e-1 5.1883402e-1 -6.661e-16
4 0.4 0.5 5.1754819e-1 5.1754819e-1 0.000e0
5 0.5 0.6 5.0088406e-1 5.0088406e-1 0.000e0
6 0.6 0.7 4.9053252e-1 4.9053252e-1 0.000e0
7 0.7 0.8 4.6757338e-1 4.6757338e-1 -1.110e-15
8 0.8 0.9 4.3931594e-1 4.3931594e-1 0.000e0
9 0.9 1 3.9929209e-1 3.9929209e-1 0.000e0
10 1 1.1 3.7068008e-1 3.7068008e-1 -9.992e-16
11 1.1 1.2 3.2647167e-1 3.2647167e-1 2.220e-15
12 1.2 1.3 2.8493445e-1 2.8493445e-1 0.000e0
13 1.3 1.4 2.4867234e-1 2.4867234e-1 2.220e-15
14 1.4 1.5 2.1104193e-1 2.1104193e-1 0.000e0
15 1.5 1.6 1.7974387e-1 1.7974387e-1 0.000e0
16 1.6 1.7 1.4714925e-1 1.4714925e-1 2.220e-15
17 1.7 1.8 1.2055655e-1 1.2055655e-1 0.000e0
18 1.8 1.9 9.4916254e-2 9.4916254e-2 2.220e-15
19 1.9 2 7.2557198e-2 7.2557198e-2 -2.109e-15
20 2 2.1 5.0569670e-2 5.0569670e-2 0.000e0
21 2.1 2.2 3.4917881e-2 3.4917881e-2 0.000e0
22 2.2 2.3 1.9675180e-2 1.9675180e-2 4.441e-15
23 2.3 2.4 5.5653059e-3 5.5653059e-3 0.000e0
Thanks for using LHAPDF 6.4.0. Please make sure to cite the paper:
Eur.Phys.J. C75 (2015) 3, 132 (http://arxiv.org/abs/1412.7420)
I'm not as experienced as @cschwan in C++, the only thing I can reasonably comment about is the following: you're writing the C++ interface wrapping the C one, but at this point is just sugar for the user, with the same (and possibly more) limitations than the C API.
I would consider making it directly on top of the Rust one, so distributing another crate
pineappl_cppapi
. But this only if you can limit the amount of boilerplate, because if the two folders have too much in common it makes no sense (we would be just duplicating).
I was thinking about making a similar comment early on, but I think having this code for users more familiar with C++ than C is a good idea, they can simply copy it. It's similar to the Fortran module that we also have in an example directory.
this is due to float arithmetics - I use an iterator to compute the bins, Christopher a fixed list:
This is because we perform floating point comparisons without any tolerance. If you run cargo clippy
there are numerous warnings, which I intend to fix at some point. They could become really problematic if you think about Grid::convolute_eko
and the x
-grid points.
Now, I have
$ md5sum DY-LO-AA.pineappl.lz4
22c711a9b85f75936cd3d21d1cab4d11 DY-LO-AA.pineappl.lz4
$ md5sum ../basic-capi-usage/DY-LO-AA.pineappl.lz4
e3785e3a8f145c5f9a960e259b5e71a3 ../basic-capi-usage/DY-LO-AA.pineappl.lz4
$ pineappl diff DY-LO-AA.pineappl.lz4 ../basic-capi-usage/DY-LO-AA.pineappl.lz4 NNPDF31_nlo_as_0118_luxqed
LHAPDF 6.4.0 loading /home/felix/local/share/LHAPDF/NNPDF31_nlo_as_0118_luxqed/NNPDF31_nlo_as_0118_luxqed_0000.dat
NNPDF31_nlo_as_0118_luxqed PDF set, member #0, version 2; LHAPDF ID = 324900
b x1 O(as^0 a^2)
--+---+---+------------+------------+-------
0 0 0.1 5.2631092e-1 5.2631092e-1 0.000e0
1 0.1 0.2 5.2549083e-1 5.2549083e-1 0.000e0
2 0.2 0.3 5.2468241e-1 5.2468241e-1 0.000e0
3 0.3 0.4 5.1883402e-1 5.1883402e-1 0.000e0
4 0.4 0.5 5.1754819e-1 5.1754819e-1 0.000e0
5 0.5 0.6 5.0088406e-1 5.0088406e-1 0.000e0
6 0.6 0.7 4.9053252e-1 4.9053252e-1 0.000e0
7 0.7 0.8 4.6757338e-1 4.6757338e-1 0.000e0
8 0.8 0.9 4.3931594e-1 4.3931594e-1 0.000e0
9 0.9 1 3.9929209e-1 3.9929209e-1 0.000e0
10 1 1.1 3.7068008e-1 3.7068008e-1 0.000e0
11 1.1 1.2 3.2647167e-1 3.2647167e-1 0.000e0
12 1.2 1.3 2.8493445e-1 2.8493445e-1 0.000e0
13 1.3 1.4 2.4867234e-1 2.4867234e-1 0.000e0
14 1.4 1.5 2.1104193e-1 2.1104193e-1 0.000e0
15 1.5 1.6 1.7974387e-1 1.7974387e-1 0.000e0
16 1.6 1.7 1.4714925e-1 1.4714925e-1 0.000e0
17 1.7 1.8 1.2055655e-1 1.2055655e-1 0.000e0
18 1.8 1.9 9.4916254e-2 9.4916254e-2 0.000e0
19 1.9 2 7.2557198e-2 7.2557198e-2 0.000e0
20 2 2.1 5.0569670e-2 5.0569670e-2 0.000e0
21 2.1 2.2 3.4917881e-2 3.4917881e-2 0.000e0
22 2.2 2.3 1.9675180e-2 1.9675180e-2 0.000e0
23 2.3 2.4 5.5653059e-3 5.5653059e-3 0.000e0
so they are not the exact same binary, but for pineappl they are the same
@felixhekhorn if you like you can compare the uncompressed grids, they might be the same. It's likely that lz4 saves a date somewhere.
@felixhekhorn if you like you can compare the uncompressed grids, they might be the same. It's likely that lz4 saves a date somewhere.
Good idea, but no - the md5 sum is still different ...
@cschwan I believe to have addressed everything from the review, so I'd be happy to merge - however, it seems I somehow broke the Fortran example in the workflow (local there is no problem) and I've no idea how, do you?
@felixhekhorn have a look at it again. If you have clang-format
could you please try to run over it? Otherwise it can be merged now!
Apart from stupid stuff, cpplint complains about:
PineAPPL.hpp:10: Found C system header after C++ system header. Should be: PineAPPL.h, c system, c++ system, other. [build/include_order] [4]
PineAPPL.hpp:231: Is this a non-const reference? If so, make const or use a pointer: LHAPDF::PDF &pdf [runtime/references] [2]
should I do something?
Apart from stupid stuff, cpplint complains about:
PineAPPL.hpp:10: Found C system header after C++ system header. Should be: PineAPPL.h, c system, c++ system, other. [build/include_order] [4] PineAPPL.hpp:231: Is this a non-const reference? If so, make const or use a pointer: LHAPDF::PDF &pdf [runtime/references] [2]
should I do something?
The first one you can follow, what's the reasoning for the second one?
The first one you can follow, what's the reasoning for the second one?
eeeehm, it's a decision by cpplint and the best I could find is this issue and the respective section in the Google guidelines - and, of course, I usually do the same, as said before: making all inputs const
so I can't modify them
we can even decide to ignore it and just merge at this point
I'll leave you the choice; I consider it good enough as is.
With my collaborators we're trying to redo a previous analysis and so I was thinking about including a PineAPPL interface into my MC event generator (https://github.com/felixhekhorn/LeProHQ ) and I started to write a proper object oriented wrapper (since I like C++ much better then C ;-) ). (Actually, it will be much faster to just run the code for 1000 replicas then writing the wrapper, but since I like the project I'll try to do it nevertheless - and it can then be used in the future)