NNPDF / pineappl

PineAPPL is not an extension of APPLgrid
https://nnpdf.github.io/pineappl/
GNU General Public License v3.0
12 stars 3 forks source link

Extend fill grid python interface #107

Closed alecandido closed 2 years ago

alecandido commented 2 years ago

This should close #81

I'm not sure that getting as input a 2 dimensional numpy array is the best idea ever: https://github.com/N3PDF/pineappl/blob/cf7a8828e73d77a8e4b0c9a05a369617932e95d2/pineappl_py/src/grid.rs#L150-L156 but at the end of the day NTuple is a meaningful concept in pineappl, and it is more or less a named array of f64 elements, so it's not that bad. The only other option would be to have 4 matching arrays x1s, x2s, q2s, weights:

alecandido commented 2 years ago

I'm doing something ugly in fill_with_array: https://github.com/N3PDF/pineappl/blob/cf7a8828e73d77a8e4b0c9a05a369617932e95d2/pineappl_py/src/grid.rs#L158

Instead I'd like to:

if possible in a single statement.

@cschwan any idea?

codecov[bot] commented 2 years ago

Codecov Report

Merging #107 (92f802c) into master (70c710a) will increase coverage by 0.15%. The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #107      +/-   ##
==========================================
+ Coverage   88.43%   88.58%   +0.15%     
==========================================
  Files          31       31              
  Lines        3070     3111      +41     
==========================================
+ Hits         2715     2756      +41     
  Misses        355      355              
Impacted Files Coverage Δ
pineappl_cli/src/main.rs 100.00% <0.00%> (ø)
pineappl_cli/src/lumi.rs
pineappl_cli/src/obl.rs 100.00% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 70c710a...92f802c. Read the comment docs.

Zaharid commented 2 years ago

The interface looks ok to me. I think these would be the requisite primitives. ISTM it would be slightly simpler to pass separate arrays and typically faster since it would typically avoid having to construct a big array somewhere in Python.

alecandido commented 2 years ago

Okay, so I'd say:

  1. the name was chosen at random, fill_with_array had the slight advantage of making explicit that I'm filling with an array, rather than filling an array, but I really don't care, so I'll put fill_array and we even preserve some compatibility
  2. I lacked context, and since observables have to be passed as an array as well (a 1-D sequence in general, that will be an array in practice) we completely loose the pineappl concept of N-tuple (in the sense that for each index you need more information than the tuple itself) and even @Zaharid's point about efficiency looks relevant to me, so we'll have 5 arrays instead
  3. about lumi and order I'd definitely go for separate calls: as you said, should not be too much of a overhead, and moreover, if you want to have a complicate setup of lumis, orders, and observables, you need to iterate over and flat the output in a 1-D array (while intrinsically it would be a higher dimensional one), and I see it as an unneeded complication, reducing transparency of the algorithm

In summary, I agree with all your points and I can take care of implementing them, it should take little. Then I'll simply lack some simple tests before merging.

alecandido commented 2 years ago

Hi @cschwan, are you aware of any rust pattern to avoid repeated instances of calls like this: https://github.com/N3PDF/pineappl/blob/dd11579c979efba764a9181997190d7e07ee34b5/pineappl_py/src/grid.rs#L171-L177 I found this post about the topic, but it seems like it's not currently possible to implement my idea of:

cschwan commented 2 years ago

@AleCandido I believe that's the correct way to do it. I'm not aware of any 'better' way.

alecandido commented 2 years ago

:disappointed:

Zaharid commented 2 years ago

I suppose one could look into the izip macro and extend it, but doesn't seem worth it.

alecandido commented 2 years ago

I suppose one could look into the izip macro and extend it, but doesn't seem worth it.

This for sure...

alecandido commented 2 years ago

I added very basic tests, but unfortunately I can't do more without implementing a wider interface.

At the moment, we have a very small API for subgrids (I already pointed out this somewhere else, while I was investigating a grid in the interpreter), and, in particular, we can only set values, but never read. Of course, there is the extra complication that there exists multiple kinds of subgrids...

Sooner or later (but not too late, since it's useful for debugging), I'd add methods like iter and stats, but definitely in another PR. I'm ready to merge.

cschwan commented 2 years ago

I added very basic tests, but unfortunately I can't do more without implementing a wider interface.

At the moment, we have a very small API for subgrids (I already pointed out this somewhere else, while I was investigating a grid in the interpreter), and, in particular, we can only set values, but never read. Of course, there is the extra complication that there exists multiple kinds of subgrids...

Sooner or later (but not too late, since it's useful for debugging), I'd add methods like iter and stats, but definitely in another PR. I'm ready to merge.

Subgrid::iter() would be the proper way to directly 'read' a subgrid apart from Subgrid::convolute().

cschwan commented 2 years ago

I added very basic tests, but unfortunately I can't do more without implementing a wider interface.

You could copy the example in https://github.com/N3PDF/pineappl/blob/master/examples/python/dyaa.py and make it a test!

cschwan commented 2 years ago

@AleCandido do you think this is ready for merging? If yes, please go ahead!

alecandido commented 2 years ago

Not yet, it's actually missing the test: I dismissed the change request because it was related to fill_array, and so outdated, but I want to implement the test first.

P.S.: scheduled today

cschwan commented 2 years ago

P.S.: scheduled today

Great :+1:! This is the last Issue for v0.5.0, after having it fixed we'll release a new Rust version, a new Python pip version, and now I'm having a look at the conda package: https://github.com/conda-forge/pineappl-feedstock. That'd mean we can announce it tomorrow and let people test it. @Zaharid might be interested in it.

Will NNPDF/runcards, N3PDF/pineko, NNPDF/fktables be affected by the Python changes?

alecandido commented 2 years ago

Will NNPDF/runcards, N3PDF/pineko, NNPDF/fktables be affected by the Python changes?

This PR is extending the API, and #106 is backward compatible at the end. So maybe #107 is the only one that might break something.

However, runcards is managed by poetry, so versions are pinned, the moment I'll try to update I simply have to test everything is still working. The other two @felixhekhorn uses daily, so we'll find out soon :)

alecandido commented 2 years ago

I raise a white flag:

Here I need a bit more of domain knowledge to put some meaningful number, and test something different from 0.

So, if you want to play a bit with numbers @cschwan do it, otherwise we can even merge (a test with 0 is not that meaningful, but at least we're testing the grid is not corrupted and can be convoluted...).

cschwan commented 2 years ago

@AleCandido thanks again, I think that's sufficient for the time being, everything else we can do in Issue #108.