Optimize `FkTable` representation

scarlehoff commented 3 years ago

For instance, at the initial scale we have that

sigma = T24 = T35 V = v15 = v24 = v35

One thing that the NNPDF fktables do is to say "ok, since these are equal I'll just sum everything in, say, V, and empty the other rows" so one gets "V+V15+V24+V35" but the luminosity function is just V.

cschwan commented 3 years ago

That's a good idea. However, it only works when the 1) charm PDF is equal to the anti-charm PDF and if, at the same time, 2) bottom- and top-quark PDFs are zero at the FK table scale. Since this isn't the case in general, but is the case for NNPDF, the place where we can do this is in Grid::convolute_eko; let's add @AleCandido and @felixhekhorn to see what they're saying.

In practice I'd give convolute_eko a few more parameters that allow the specification of these constraints. That metadata of the FK tables should also reflect the constraints, and ideally allow PineAPPL to check the PDF that is used to prevent the user from using sets that don't fulfill them. We should probably discuss more optimization strategies that in general would allow us to speed up convolute_eko and make the FK tables smaller; I'm changing the title of this Issue to reflect that.

scarlehoff commented 3 years ago

1) charm PDF is equal to the anti-charm PDF

@felixhekhorn had the same worry. I'd say for now we can accept that to be true

2) bottom- and top-quark PDFs are zero at the FK table scale

But this is always true (or always false) for a given fktable, right?

Personally I was thinking of reduced_table() and reduced_lumi() methods (or something like that) where the simplification is done by the user (i.e., me) if they (i.e., I) need a smaller fktable. It doesn't really matter to me if the fktable are unnecessarily big on disk or if I need to spend a few more seconds to load them* but I want them to be as small as possible in memory for the fit.

*of course, I'm even happier if their size is reduced also in disk but that's a second order problem for me

felixhekhorn commented 3 years ago

But as you said this is a NNPDF thing so let's do as @scarlehoff and I said in person: let's change the .tables() ... (that is at the latest point possible ...)

alecandido commented 3 years ago

I agree on not to add any further complication to convolute_eko, and add a method to the FkTable object.

However, instead of providing methods for a different representation, I would add a method to get a new FkTable object by collapsing some rows of the original one. And then you can even dump the new object, already collapsed.

felixhekhorn commented 3 years ago

We should probably discuss more optimization strategies that in general would allow us to speed up convolute_eko and make the FK tables smaller; I'm changing the title of this Issue to reflect that.

at some point definetly yes, however let's focus for the moment on the things we need to reach our deadline :upside_down_face:

alecandido commented 3 years ago

I would say that the idea of returning a new collapsed object serves well any possible usage:

if you don't care about disk representation and initial effort, you can load whatever you have there, call the method and get the new collapsed FkTable in memory
if you do care, you do the operation offline, and you dump the result

cschwan commented 3 years ago

I would say that the idea of returning a new collapsed object serves well any possible usage:

* if you **don't** care about disk representation and initial effort, you can load whatever you have there, call the method and get the new collapsed `FkTable` in memory

* if you **do** care, you do the operation offline, and you dump the result

That's a good idea! We could call it FkTable::optimize(&self, constraints: ...) -> FkTable.

alecandido commented 3 years ago

I agree, at the end of the day an FkTable is conceptually different from a Grid, so there is no clash.

felixhekhorn commented 3 years ago

That's a good idea! We could call it FkTable::optimize(constraints: ...) -> FkTable.

Mmm but this clashes with our idea of an FKTable beeing immutable ...

cschwan commented 3 years ago

That's a good idea! We could call it FkTable::optimize(constraints: ...) -> FkTable.

Mmm but this clashes with our idea of an FKTable beeing immutable ...

Not really, because it returns another FkTable and doesn't modify self.

felixhekhorn commented 3 years ago

but will this work if the thing is not clonable? (and we want this)

cschwan commented 3 years ago

That's a good question...

cschwan commented 2 years ago

@AleCandido @felixhekhorn Important question: are the grids (opposed to the grids convoluted with PDFs) for sigma, T24, T35, and for V, V15, V24, V35 the same? In that case we can easily implement this optimization at the level of redefining the luminosity functions.

felixhekhorn commented 2 years ago

No (if I understand correctly) because some processes, e.g. CC DIS or CC DY, can resolve difference between b and bbar, e.g., and hence V24 and T24 are unique

scarlehoff commented 2 years ago

@cschwan is this available? i..e, can I do .optimize() to get

sigma = T24 = T35 V = v15 = v24 = v35

? Or maybe this should be done by pineko (since it depends on the final scale)? @felixhekhorn

cschwan commented 2 years ago

@scarlehoff The short answer is no. How important is that for the fit in terms of a timescale?

I'm a bit hesitant to implement the optimization because I'm still wondering what the proper way to do it would be. Isn't this technically a different basis?

In any case I'd like make the assumptions explicit somewhere so that the user can't use a PDFs where these assumptions are not fulfilled.

scarlehoff commented 2 years ago

@scarlehoff The short answer is no. How important is that for the fit in terms of the timescale?

When we do the fit with all the tables the memory is going to grow quite a bit and thus the fits will take longer. It's not a disaster but it's a big QoL feature.'

Isn't this technically a different basis?

Yes and no. It's just an optimization over the evolution basis at a given scale.

In any case I'd like make the assumptions explicit somewhere so that the user can't use a PDFs where these assumptions are not fulfilled.

One possibility is that this is done at loading time, so that one can ask for the fktable table or the optimized_fktable where some of the values are summed (for instance T24 and 35 get absorbed into singlet) Saving them in an already optimized format makes it easier in the storage but I don't care that much about that part.

cschwan commented 2 years ago

When we do the fit with all the tables the memory is going to grow quite a bit and thus the fits will take longer. It's not a disaster but it's a big QoL feature.'

I see, that's why I've added this to the v0.5.0 milestone.

Would the interface to the fit understand if I'd remove T24, T35, V15, V24 and V35 from the list of luminosities or should I keep them in there (their contribution would be zero)?

@felixhekhorn @AleCandido I'd suggest to add one parameter to FkTable::optimize, because somewhere higher up the theory must tell us which assumptions we need (for instance whether we distinguish between charm and anti-charm as mentioned in the PC, or not).

scarlehoff commented 2 years ago

Would the interface to the fit understand if I'd remove T24, T35, V15, V24 and V35 from the list of luminosities or should I keep them in there (their contribution would be zero)?

Since this is an optimization feature I guess all 0s should be removed for maximum optimization.

alecandido commented 2 years ago

In any case, we don't want this to be a breaking change: we want the fit to work without, but it would be quite bad if it's not going to still work with the 0s.

Thus, I propose to release v0.5.0, and start a v0.5.1 milestone

cschwan commented 2 years ago

@felixhekhorn @AleCandido I'd suggest to add one parameter to FkTable::optimize, because somewhere higher up the theory must tell us which assumptions we need (for instance whether we distinguish between charm and anti-charm as mentioned in the PC, or not).

Since FkTable::optimize doesn't exist yet, contrary to what I said above, we can add FkTable::optimize which accepts one parameter that specifies the assumptions. This wouldn't be a breaking change, and then we can add it in v0.5.1, as @AleCandido said.

cschwan commented 2 years ago

Commit cf02e33dc65c53343c0276fd411094c6fe2bc95e should implement all interface-visible changes to do exactly what @scarlehoff proposed.

As promised there's one new method FkTable::optimize, which requires an enum FkAssumptions; please have a look at the documentation of it,

https://github.com/N3PDF/pineappl/blob/cf02e33dc65c53343c0276fd411094c6fe2bc95e/pineappl/src/fk_table.rs#L47-L72

and let me know if I can improve it. It should enable a general version of the optimization that @scarlehoff proposed in the first comment, if we want to fit bottom = anti-bottom, for instance. The optimization should work for FK tables in the flavour and evolution-basis.

What we need next is 1) the corresponding method in the Python interface and 2) probably a slightly modified theory that specifies the assumption and that calls this method in Pineko/fktable. @AleCandido @felixhekhorn could you please have a look at this?

What's still missing is the actual optimization, for the time being only the luminosity function is rewritten and therefore one gets repeated entries. I'm working on that now.

alecandido commented 2 years ago

I'll update immediately Python interface, I guess there are only two things to bind:

[x] extra method FkTable.optimize()
[x] enum FkAssumptions, that is needed as an input

It should not be hard.

alecandido commented 2 years ago

In 834a041397842c16cedac326e5012632a0e81dab I added the bindings for the method.

I had to make the FkAssumptions Copy (it's just a plain C-like enum, so it should be fine). In order to be Copy, it had to be Clone (according to cargo, and so I guess rustc).

felixhekhorn commented 2 years ago

At this point it would have been worth a dedicated PR ;-)

alecandido commented 2 years ago

In c63b0ea0b2355b84b8d16b8d767e34513bb830c1 it becomes actually possible to generate FkAssumptions, choosing them by string.

Since this mapping is encoded twice: https://github.com/N3PDF/pineappl/blob/c63b0ea0b2355b84b8d16b8d767e34513bb830c1/pineappl/src/fk_table.rs#L323-L331 https://github.com/N3PDF/pineappl/blob/c63b0ea0b2355b84b8d16b8d767e34513bb830c1/pineappl_py/src/fk_table.rs#L34-L42

I would suggest to split it in an HashMap and reuse it.

cschwan commented 2 years ago

@AleCandido I agree, I'll implement the trait Display and TryFrom (?) for FkAssumptions, for which I'll probably also make the strings look like exactly like the enum values. We should probably hold off feeding strings into the Python interface until this is final.

alecandido commented 2 years ago

Ok.

At the moment, it is what it is, if you wish to remove it you can do it, but since it's not part of a release I believe we can keep it, and use creation from Python just to experiment. The moment you have a better candidate, I'll make sure to update.

cschwan commented 2 years ago

Implemented in commit 4427e9ed22c50cb4b8eacbc351c60aee7c01320f.

cschwan commented 2 years ago

Commit 19aa0a5004a906ab5a6ad1a10a8a1fded378d640 should implement everything we need, and adds the option --fk-table=Nf4Sym, for instance, to pineappl optimize. This will allow us to optimize the already generated FK tables. I still have to test everything, though.

alecandido commented 2 years ago

Well done :D

Is there anything left that is needed (but testing) @scarlehoff?

scarlehoff commented 2 years ago

To first approximation only testing.

cschwan commented 2 years ago

First tests look very promising:

run mkdir ../appl_subfktables_opt && for i in *.pineappl.lz4; do pineappl optimize --fk-table=Nf4Sym $i ../appl_subfktables_opt/$i; done in /media/FK/fktables/data/appl_subgrids on dom to optimize the tables - that's surprisingly quick
the size difference between appl_subfktables and appl_subfktables_opt is quite nice: 1.3 GB vs. 643 MB, that's a bit less than half the size
the differences after convolution are typically 1e-12, at maximum 1e-10; run: for i in *.pineappl.lz4; do pineappl --silence-lhapdf diff --ignore-lumis $i ../appl_subfktables_opt/$i NNPDF40_nnlo_as_01180; done

cschwan commented 2 years ago

Finally it seems to do what it promises:

pineappl obl --lumis appl_subfktables_opt/200-ATLASDY2D8TEV-aMCfast_obs_0.pineappl.lz4

doesn't show evolution IDs higher than 208 (V8) and 115 (T15). Prior to optimization this grid has 91 luminosities, after optimization there are only 45 luminosities which is plausible if you think about the size reduction.

Applying pineappl optimize to the PineAPPL grids seems to be non-modifying as expected.

cschwan commented 2 years ago

As far as PineAPPL is concerned this Issue can be closed, if you agree. Don't forget to call FkTable::optimize in pineko.

alecandido commented 2 years ago

In order to use this in pineko, we'll need a new release of pineappl.

If you don't have anything more to add close in the future @cschwan, I would just release 0.5.1.

scarlehoff commented 2 years ago

I have one question, the optimize only removes 1) flavours that are 0 and 2) reduces flavours by summing them into the same "histogram-bin" (depending on the theory) but the xgrid remains the same for all fktables? right @cschwan ?

cschwan commented 2 years ago

That's correct, @scarlehoff!

NNPDF / pineappl

Optimize `FkTable` representation #79