Open alecandido opened 8 months ago
This is not urgent at all, it is just a consideration that came to my mind reading #248, nothing more.
I had similar considerations in the past, but
pineappl_py
in the Python plotting script is possible, but then we introduce two more dependencies: pineappl_py and LHAPDF. Especially the latter, particularly the Python version, is a pain in the neck to get working. Another downside is that the script will take much longer, which we probably want to avoid, given that we often work on the Python script to modify the plot, without having to regenerate the data.That being said I believe the plotting script(s) and functionality can be significantly improved. I was thinking of using a templating engine in such a way that the template is really kept in one file. This mostly is the case right now (in this file: https://github.com/NNPDF/pineappl/blob/master/pineappl_cli/src/plot.py), but writing out the data could be improved, for instance.
Regarding the data handling, I see your point. However, I prefer to keep things separate, and rather gather through other means (e.g. folders or archives), since it makes it easier to consume them even for further purposes. Fine to cache the results anyhow, but it could be implemented really as cache: if this file exists, use it, otherwise recompute it. The file name would stem from the grid, with the option to manually pass a file.
The idea of moving to Python is really to avoid the template (with or without a suitable engine). This is because I appreciate more libraries than code generators (less copy-paste, version management, and similar benefits). They are even easier to test and debug, because they are native in the language they are written in (while you'd need to use Python to test the script generated from Rust).
Since this part of the CLI functionality is already mostly Python, I would expose the interface through Python. We could add the bindings for the rest in the same package of the plotting CLI.
P.S.: regarding point 4., the proposal was to write a Python function main(data, metadata)
, and make the call to main
from Rust, through PyO3, such that the data are produced by Rust (with its own native data types) and marshalled to Python by PyO3, without the need of dumps in any file (not the source, nor anything else) - as I said, it's not needed, and it's also against your desiderata (i.e. caching)
The idea of moving to Python is really to avoid the template (with or without a suitable engine). This is because I appreciate more libraries than code generators (less copy-paste, version management, and similar benefits). They are even easier to test and debug, because they are native in the language they are written in (while you'd need to use Python to test the script generated from Rust).
I think the crucial point that you dispute is whether we want to directly output the generated plot or a plotting script that in turn generates the plot (where the code sits is another question). I chose the latter for several reasons:
pineappl plot ... | python
and directly get the plot; I don't need to bother with any additional filesI don't think you can avoid having a code generator, unless you invent an API that sits on top of matplotlib, for instance (is that what you meant?). However, I don't see the added value of that - for me matplotlib IS the API to plot the data. Which advantages do you see?
I don't think you can avoid having a code generator, unless you invent an API that sits on top of matplotlib, for instance (is that what you meant?). However, I don't see the added value of that - for me matplotlib IS the API to plot the data. Which advantages do you see?
Actually, this is the whole point, and nothing else.
Consider that partly you're already making such an API: https://github.com/NNPDF/pineappl/blob/e71b39acddaf2d09199fdb298f5d5afe38e771b2/pineappl_cli/src/plot.py#L104-L161 and the functions present have already enough arguments to be very customizable, without the need of patching them https://github.com/NNPDF/pineappl/blob/e71b39acddaf2d09199fdb298f5d5afe38e771b2/pineappl_cli/src/plot.py#L275-L284
The idea is that, if you want to do something very basic, your script could be a regular Python script, possibly with 3-4 lines only, importing pineappl.plot
and just stating what do you want.
Instead, if you want to do something more convolute, you can avoid sed
ing a Python script, but rather make loops in Python and work with the data themselves (rather than their code representation). It's one layer less.
The API would be just the script, maybe slightly rearranged, but no more. I.e.:
percent_diff
and set_ylim
main
handler (that puts in the usual column the desired panels)data
/metadata
loaderContent of 2. and 3. are most likely always repeated. Instead, 1. is intended as a default, but the customization requires code manipulation, rather than passing different values. For 4. you would not be able to customize the code, but my guess is that you frequently don't need, since small modifications can be done by passing different values. If you instead need a quite different plot, you could make your own, even copying from the source one of the existing ones as an example (to me, it would be an improvement, but I expect this to rarely happen anyhow).
The last is just what we were discussing at the beginning.
Currently,
pineappl plot
relies on Python and Matplotlib. This is actually quite nice, because people using PineAPPL are often familiar with Python, and know how to plot the information, once given.A current mild pain-point is that there is a lot of string manipulation involved in the call, since a Python script is generated everytime, embedding the data: https://github.com/NNPDF/pineappl/blob/3476629d40f4b4af164ce067ec2d029781be048c/pineappl_cli/src/plot.rs#L508-L521
However, dumping the script is a design featured, in such a way that people are able to customize it.
I see a few possible ways of improving:
pineappl plot
operations: dump the script and the data separately (in a common data format, like CSV)pineappl_py
in the script, and directly extract the data from the grid (requires the installation of the PineAPPL python extension)pineappl_py
itself (possibly with apineappl-plot
CLI, but it would be installed separately, at least until #176 will be completed)