NNPDF / pineappl

PineAPPL is not an extension of APPLgrid
https://nnpdf.github.io/pineappl/
GNU General Public License v3.0
12 stars 3 forks source link

Merging grids results in unexpected behavior #286

Open ggvanseev opened 2 months ago

ggvanseev commented 2 months ago

Issue arises in python, did not check other places. Merging grids would from a user perspective result in the addition of the values at same bins x1,x2, etc. and result into addition at missing values. However, the merge results in errors or a None object.

Error case: Input: grid = pineappl.grid.Grid.read(path1) grid.merge(grid) Output: Traceback (most recent call last): File "<string>", line 1, in <module> File "/data/theorie/gseevent/.conda/envs/lhapdf/lib/python3.11/site-packages/pineappl/grid.py", line 440, in merge self.raw.merge(other.raw) RuntimeError: Already mutably borrowed

None case Input: grid=pineappl.grid.Grid.read(path1) grid2 = pineappl.grid.Grid.read(path2) grid3 = grid2.merge(grid)

'Output', grid3: None

ggvanseev commented 2 months ago

In the None case it is important to note that grid and grid2 do have the exact same bins, q2 and orders (i.e. they are identical except for the exact filling of the bins and the creation datetime)

felixhekhorn commented 2 months ago

Error case:

I'm not sure what the expected result of this operation is. I think the grid object is now clone-able in case you want to add the same thing to its self (but why not simply multiply=scale by 2 then?). Else I think Rust is complaining about ownership, i.e. the Rust Grid::merge has both arguments as mutable which is unresolvable (as the error says). (@cschwan actually why does other need to be mutable there?)

None case

This is expected, since the merging is happening in place , i.e. grid2 got modified and Grid::merge does indeed return None

cschwan commented 2 months ago

I suppose the Python interface should convert the return value of Grid::merge it into a Python exception if the return values isn't Ok(()). For the time being simply ignore the return value (None means it worked fine), merge mutates the object as @felixhekhorn already pointed out.

I'm not sure why grid.merge(grid) fails with the error message that you posted, but you probably don't want to do that either.

alecandido commented 2 months ago

(@cschwan actually why does other need to be mutable there?)

https://github.com/NNPDF/pineappl/blob/f52469ca649656ef6d4ef365a9dbc087b312823f/pineappl/src/grid.rs#L917

I guess?

Though I'm not sure why other's bin limits are allowed to change. Not explained in the docs... https://github.com/NNPDF/pineappl/blob/f52469ca649656ef6d4ef365a9dbc087b312823f/pineappl/src/grid.rs#L887-L892

Issue arises in python, did not check other places.

In any case, even though there would be a meaning to merge a grid with itself (that I'm not sure I see, since it should be a scale by 2, as @felixhekhorn said), this is unsupported in Rust itself...

alecandido commented 2 months ago

I suppose the Python interface should convert the return value of Grid::merge it into a Python exception if the return values isn't Ok(()).

@cschwan it is actually doing it, it is raising a RuntimeError

(None means it worked fine),

All Python functions return something. If you do not return anything, they return None. So, you're just reading the output of a function not returning anything, since it's acting in-place (as pointed out by @felixhekhorn).

cschwan commented 2 months ago

(@cschwan actually why does other need to be mutable there?)

It's an optimization. If we merge an empty grid with a non-empty one, we simply ~copy~ move the non-empty one over the empty. That's fast and memory efficient, and it happens fairly often if you create grids in the way that Madgraph5_aMC@NLO does.

alecandido commented 1 month ago

It's an optimization. If we merge an empty grid with a non-empty one, we simply ~copy~ move the non-empty one over the empty.

I'm not sure I understand why you might want to merge an empty grid...

That's fast and memory efficient, and it happens fairly often if you create grids in the way that Madgraph5_aMC@NLO does.

... but not being familiar with the way mg5 uses PineAPPL could be the actual motivation.

(though I'd often suggest avoiding having a function behave in two different ways, and instead use two different functions for that)

cschwan commented 1 month ago

It's an optimization. If we merge an empty grid with a non-empty one, we simply ~copy~ move the non-empty one over the empty.

I'm not sure I understand why you might want to merge an empty grid...

I meant so say subgrid. If you merge grids in which each only one channel is filled (that's how Madgraph5_aMC@NLO does it), then you end up having a lot of empty subgrids where merging is trivial if you allow subgrids to be moved out of the other grid.

alecandido commented 1 month ago

I meant so say _sub_grid. If you merge grids in which each only one channel is filled (that's how Madgraph5_aMC@NLO does it), then you end up having a lot of empty subgrids where merging is trivial if you allow subgrids to be moved out of the other grid.

Ok, this is a perfectly valid explanation.

In principle, you might need both, the mutable and immutable version (in which you copy). In practice, there might be no use case for the immutable, so it is good as it is, and the error is perfectly fair.

Maybe, the only missing bit is to document this behavior in the method docstring.

ggvanseev commented 1 month ago

Error case:

This one is clear, and an unnecessary operation anyway. However, perhaps adding an error that just explains that merging the same grid with itself is unsupported would be user friendly.

None case

This is also clear. In terms of enhancing user experience, I recommend either updating the docstring and documentation to clarify that the operation occurs in place, or following the convention of the pandas library by returning the merged grid as a new object.