NNPDF / pineappl

PineAPPL is not an extension of APPLgrid
https://nnpdf.github.io/pineappl/
GNU General Public License v3.0
12 stars 3 forks source link

Python interface improvements #74

Closed alecandido closed 2 years ago

alecandido commented 2 years ago

While working on stuffs that make use of python interface, new needs arose. Thus, there is room to add some improvements.

Following a list of possible ones:

alecandido commented 2 years ago

As discussed with @felixhekhorn we'll reserve Grid.__get_item__() and Grid.__set_item__() and Grid.__iter__() for managing subgrids, since Grid is a subgrids container in the first place (and a metadata container only afterwards).

alecandido commented 2 years ago

Most of the functions of CLI consists mainly in printing tables, something we don't need for the python interface (since we can handle objects, and it is comfortable to do it), nevertheless we are wondering if it's worth to bind pineappl_cli.helpers.

cschwan commented 2 years ago

Most of the functions of CLI consists mainly in printing tables, something we don't need for the python interface (since we can handle objects, and it is comfortable to do it), nevertheless we are wondering if it's worth to bind pineappl_cli.helpers.

If you find that helpful I'd rather suggest to import the functionality into PineAPPL and remove the corresponding helper (and write a wrapper to the new function in PineAPPL). I'm definitely going to do that for the convolute function, which takes care of charge conjugation and different initial states; this also overlaps with #66. Let me know if you need more.

alecandido commented 2 years ago

Most of the functions of CLI consists mainly in printing tables, something we don't need for the python interface (since we can handle objects, and it is comfortable to do it), nevertheless we are wondering if it's worth to bind pineappl_cli.helpers.

If you find that helpful I'd rather suggest to import the functionality into PineAPPL and remove the corresponding helper (and write a wrapper to the new function in PineAPPL). I'm definitely going to do that for the convolute function, which takes care of charge conjugation and different initial states; this also overlaps with #66. Let me know if you need more.

I would say just convolute() (and consequently SCALES_VECTOR). I can do here or we can do separately, that's fine.

You can move yourself if you wish, or I can move it, but: where should I place? and how should I call it? (my guess would be to put it in grid.rs as a separate functions, i.e. not inside the impl block, but I don't have any good name other than convolute).

cschwan commented 2 years ago

Most of the functions of CLI consists mainly in printing tables, something we don't need for the python interface (since we can handle objects, and it is comfortable to do it), nevertheless we are wondering if it's worth to bind pineappl_cli.helpers.

If you find that helpful I'd rather suggest to import the functionality into PineAPPL and remove the corresponding helper (and write a wrapper to the new function in PineAPPL). I'm definitely going to do that for the convolute function, which takes care of charge conjugation and different initial states; this also overlaps with #66. Let me know if you need more.

I would say just convolute() (and consequently SCALES_VECTOR). I can do here or we can do separately, that's fine.

You can move yourself if you wish, or I can move it, but: where should I place? and how should I call it? (my guess would be to put it in grid.rs as a separate functions, i.e. not inside the impl block, but I don't have any good name other than convolute).

My current idea is that instead of passing three functions (xfx1, xfx2 and alphas) you pass a single PdfCache object (for which I have yet to write the struct). You can then easily cache PDFs, even among several convolute calls, which will speed up a few of the subcommands.

alecandido commented 2 years ago

Added syntactic sugar to get and set subgrids, but the get function is not really working well.

The following code:

import pineappl
g = pineappl.grid.Grid.read("dyaa.pineappl")
s = g[0, 0, 0]
g[0, 1, 0] = s

(with dyaa.pineappl the grid generated from the example) raises with the following error

AttributeError                            Traceback (most recent call last)
<ipython-input-5-346819bef8ca> in <module>
----> 1 g[0, 1, 0] = s

~/Projects/N3PDF/pineappl/pineappl_py/env/lib/python3.9/site-packages/pineappl/grid.py in __setitem__(self, key, subgrid)
    140             raise ValueError("A tuple with `(order, bin, lumi)` is required as key.")
    141
--> 142         self.set_subgrid(*key, subgrid)
    143
    144     def set_remapper(self, remapper):

~/Projects/N3PDF/pineappl/pineappl_py/env/lib/python3.9/site-packages/pineappl/grid.py in set_subgrid(self, order, bin_, lumi, subgrid)
    122                 subgrid content
    123         """
--> 124         self.raw.set_subgrid(order, bin_, lumi, subgrid.into())
    125
    126     def __setitem__(self, key, subgrid):

AttributeError: 'builtins.PySubgridEnum' object has no attribute 'into'

Essentially the error is due to the subgrid not having been recast to any specific subgrid type when retrieved, so the subgrid.into() call is unneeded. I believe it should be recast to a meaningful type to be able to do operations, but since I don't know which one should be maybe that's a hint I'm not thinking to the correct solution.

cschwan commented 2 years ago

As discussed with @felixhekhorn we'll reserve Grid.__get_item__() and Grid.__set_item__() and Grid.__iter__() for managing subgrids, since Grid is a subgrids container in the first place (and a metadata container only afterwards).

Do __get_item__ and others significantly improve the usability of Grid? I'm a bit sceptical here; I'd actually like to discourage the use of subgrids, because everything should be done using convolute, convolute_subgrid and convolute_eko. This doesn't always work (for instance in yadism), but that should be the aim.

codecov[bot] commented 2 years ago

Codecov Report

Merging #74 (714b559) into master (c737cba) will increase coverage by 10.04%. The diff coverage is 88.46%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master      #74       +/-   ##
===========================================
+ Coverage   67.65%   77.69%   +10.04%     
===========================================
  Files          13       14        +1     
  Lines        2226     2049      -177     
===========================================
+ Hits         1506     1592       +86     
+ Misses        720      457      -263     
Impacted Files Coverage Δ
pineappl/src/fk_table.rs 38.96% <ø> (+2.37%) :arrow_up:
pineappl/src/lumi.rs 70.87% <ø> (+63.18%) :arrow_up:
pineappl/tests/drell_yan_lo.rs 100.00% <ø> (ø)
pineappl/src/grid.rs 72.57% <88.46%> (+11.72%) :arrow_up:
pineappl/src/subgrid.rs 96.87% <0.00%> (-0.46%) :arrow_down:
pineappl_cli/src/main.rs 0.00% <0.00%> (ø)
pineappl/src/bin.rs 88.19% <0.00%> (+0.54%) :arrow_up:
pineappl/src/lagrange_subgrid.rs 87.13% <0.00%> (+2.52%) :arrow_up:
pineappl/src/empty_subgrid.rs 100.00% <0.00%> (+5.88%) :arrow_up:
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update c737cba...714b559. Read the comment docs.

alecandido commented 2 years ago

Do __get_item__ and others significantly improve the usability of Grid? I'm a bit sceptical here; I'd actually like to discourage the use of subgrids, because everything should be done using convolute, convolute_subgrid and convolute_eko. This doesn't always work (for instance in yadism), but that should be the aim.

Ok, this might be a point. Nevertheless, I wonder if giving write access without read access is a good idea (about __setitem__ instead is just for building abstractions, in python it is useful to build on existing abstractions, and if a Grid is a subgrid containers you expect to be able to access them, set them and iterate over them, doing it with standard functions helps the user to rely on known abstractions).

In case I can just revert and remove Grid.subgrid().

alecandido commented 2 years ago

Instead I'm running in more serious troubles with Grid.merge().

We decided to keep Grid non-clonable, and I agree it is a good idea. But for Grid.merge() we need to pass a second grid, and it looks like it doesn't like:

error[E0277]: the trait bound `PyGrid: Clone` is not satisfied
   --> src/grid.rs:345:40
    |
345 |     pub fn merge(&mut self, mut other: Self) {
    |                                        ^^^^ the trait `Clone` is not implemented for `PyGrid`
    |
    = note: required because of the requirements on the impl of `pyo3::FromPyObject<'_>` for `PyGrid`

Any suggestion?

P.S.: PyGrid is not clonable because it contains a Grid, that as said above it's not clonable

cschwan commented 2 years ago

Instead I'm running in more serious troubles with Grid.merge().

We decided to keep Grid non-clonable, and I agree it is a good idea. But for Grid.merge() we need to pass a second grid, and it looks like it doesn't like:

error[E0277]: the trait bound `PyGrid: Clone` is not satisfied
   --> src/grid.rs:345:40
    |
345 |     pub fn merge(&mut self, mut other: Self) {
    |                                        ^^^^ the trait `Clone` is not implemented for `PyGrid`
    |
    = note: required because of the requirements on the impl of `pyo3::FromPyObject<'_>` for `PyGrid`

Any suggestion?

P.S.: PyGrid is not clonable because it contains a Grid, that as said above it's not clonable

That's a tough one, let me think about it.

felixhekhorn commented 2 years ago

Mmm, I think the only solution (if we want to keep Grid non-clonable) is to provide a function merge_from_file(Grid, path) and then read the Grid in Rust ...

cschwan commented 2 years ago

Mmm, I think the only solution (if we want to keep Grid non-clonable) is to provide a function merge_from_file(Grid, path) and then read the Grid in Rust ...

Yes, I agree this is how we should do it!

alecandido commented 2 years ago

Mmm, I think the only solution (if we want to keep Grid non-clonable) is to provide a function merge_from_file(Grid, path) and then read the Grid in Rust ...

Yes, I agree this is how we should do it!

Then I'm going to implement it immediately

alecandido commented 2 years ago

Rebased on current status of master

cschwan commented 2 years ago

The item concerning the convolute helpers now basically amounts to writing a Python wrapper to LumiCache. I need to replace every call to convolute with a corresponding call to convolute2 (and after this is done rename convolute2 to convolute), which will require a LumiCache. This cache you can create with one PDF set (LumiCache::with_one) or with two PDF sets (LumiCache::with_two). You'll need only one PDF set for all DIS processes, for most LHC processes, and for Tevatron and similar colliders, because the anti-proton PDF is automatically generated through charge conjugation in LumiCache. Two PDF sets are really only needed for processes like proton-lead collisions.

I'll try to make the necessary adjustments to the Python wrapper myself.

cschwan commented 2 years ago

The second item should be ticked off with the last two commits. @AleCandido @felixhekhorn could you please test it with Pineko and everywhere the Python interface is used? It's unfortunately a breaking change, because you have to pass the PDG MC ID of the hadron whose PDF you're passing to PineAPPL. On the other hand, PineAPPL takes care of charge conjugation needed for anti-protons, and it'll be possible to also calculate lead-proton collisions.

alecandido commented 2 years ago

The second item should be ticked off with the last two commits. @AleCandido @felixhekhorn could you please test it with Pineko and everywhere the Python interface is used? It's unfortunately a breaking change, because you have to pass the PDG MC ID of the hadron whose PDF you're passing to PineAPPL. On the other hand, PineAPPL takes care of charge conjugation needed for anti-protons, and it'll be possible to also calculate lead-proton collisions.

Thank you very much, I'm definitely going to do it today.

I'm not even sure we should do the third item, so today I'm going to finish NNPDF/runcards#111 (that it would be finished, if I hadn't added a new item for cuts), and then I'll check that and tackle the forth item, and maybe we could even merge.

felixhekhorn commented 2 years ago
  • maybe a function to generate order masks from something similar to "NLO QCD" or "NLO EW", see What is the syntax for the orders? #73
  • Python binding for Add Grid::write_lz4 method #80

Maybe we can defer item 3 and 4 to a new PR in order to close this one?

alecandido commented 2 years ago

Maybe we can defer item 3 and 4 to a new PR in order to close this one?

I'd keep 4 in this one, and postpone just 3.

cschwan commented 2 years ago

The last item is implemented in commit dfdcdcc for the Rust API and in the CLI. Whenever you run pineappl with a subcommand that writes a file, giving the file the extension .lz4 also let's the CLI compress it.

alecandido commented 2 years ago

Re https://github.com/N3PDF/pineappl/pull/74#issuecomment-951068705

@cschwan tested with pineko together with @felixhekhorn, it required a very tiny fix (b1bba7e) but it's now working.

alecandido commented 2 years ago

Sorry to have waited for so long, but now this PR should address even point 4 with a4bf1f2, it was very simple.

Closes #80

cschwan commented 2 years ago

Commit 714b559 should let you select a subset of orders in a Grid when generating an FkTable with Grid::convolute_eko.

cschwan commented 2 years ago

@AleCandido @felixhekhorn can we merge this?

alecandido commented 2 years ago

I reviewed just now, and the only thing I could spot is really the migration of convolute_subgrid, but there is a TODO and we can do on master.

Then let's merge, I would say.

cschwan commented 2 years ago

@AleCandido @felixhekhorn do you need another release for the Python interface to be uploaded? If so, I'm going to release it as 0.5.0-beta.2.

alecandido commented 2 years ago

Yes, that would be nice. Then I'm going to update dependencies in runcards, pineko. and yadism.

felixhekhorn commented 2 years ago

Not necessarily, but a release would make things easier and since the last we added several new features ... by the way, are you're sure about the naming? shouldn't it be 0.5.0b2? see https://pypi.org/project/poetry/#history

alecandido commented 2 years ago

I believe there is no official way of naming alpha, beta, and release_candidate (for sure it's not specified in semver). Just be consistent with one scheme.

cschwan commented 2 years ago

Yes, as far as semver is concerned both are OK (I think).

alecandido commented 2 years ago

Ok, I just checked and the answer is definitely yes:

A pre-release version MAY be denoted by appending a hyphen and a series of dot separated identifiers immediately following the patch version. Identifiers MUST comprise only ASCII alphanumerics and hyphens [0-9A-Za-z-]. Identifiers MUST NOT be empty. Numeric identifiers MUST NOT include leading zeroes. Pre-release versions have a lower precedence than the associated normal version. A pre-release version indicates that the version is unstable and might not satisfy the intended compatibility requirements as denoted by its associated normal version. Examples: 1.0.0-alpha, 1.0.0-alpha.1, 1.0.0-0.3.7, 1.0.0-x.7.z.92, 1.0.0-x-y-z.–.

https://semver.org/#spec-item-9

or even more: @cschwan is more correct then poetry, that is not using the dash (but even the official CPython is omitting the dash...)

cschwan commented 2 years ago

@AleCandido I released v0.5.0-beta.2, but building the documentation fails and I can't reproduce the error: https://readthedocs.org/projects/pineappl/builds/15107369/

alecandido commented 2 years ago

Sorry not to have replied before, I'm glad that you already solved the issue!