NNPDF / reportengine

A framework for declarative data analysis
https://data.nnpdf.science/validphys-docs/guide.html
GNU General Public License v2.0
1 stars 2 forks source link

Modifications to reportengine to allow for validphys.api module #18

Closed wilsonmr closed 5 years ago

wilsonmr commented 5 years ago

title explains everything, in particular want to be able to not perform final with actions and not have to specify output path

this probably requires some actual testing since it could break some things

Zaharid commented 5 years ago

One problem is that we would like to avoid dumping invalid or null folders to the namespace. We do that in a very convoluted way in here:

https://github.com/NNPDF/reportengine/blob/13baf2aadcd1439b4576d985937c80f5d97fa683/src/reportengine/environment.py#L119

Instead we would like that the keys remain undefined, so interactive users can (and have to) define the paths they want.

Zaharid commented 5 years ago

Should we make some of the changes in validphys here instead. In particular I imagined that there would be something like reportengine.API.

Zaharid commented 5 years ago

Note that at the moment the keys that get dumped are defined here, to maximize the confusion mostly:

https://github.com/NNPDF/reportengine/blob/13baf2aadcd1439b4576d985937c80f5d97fa683/src/reportengine/environment.py#L112

Zaharid commented 5 years ago

It would be great if API had a __dir__ attribute, possibly implementing something like this:

https://github.com/NNPDF/reportengine/blob/13baf2aadcd1439b4576d985937c80f5d97fa683/src/reportengine/utils.py#L79

wilsonmr commented 5 years ago

I think I should go through the app module and work out which parts of that should be accessible from the API and then work a bit harder and integrating the two

Zaharid commented 5 years ago

On first look, this looks very nice. Could you please add some tests? I guess test_complexinput and test_builder are the closest places to look for inspiration. Ideally when you do

pytest --pyargs reportengine --cov-report html --cov

the stuff in test_api should cover the whole api file.

(after conda install pytest-cov)

wilsonmr commented 5 years ago

Ok I gave that a go, not sure if the tests are quite what we want but the coverage is 100% (which is quite easy at the moment given api is small)

Zaharid commented 5 years ago

Ok, this looks very nice now.

Zaharid commented 5 years ago

I am thinking a more serious thing is that the way it is now, every time you call an action, everything gets built from scratch. This is problematic as is, because for development you would want to compute expensive things once and then use them for interactive tinkering.

One possibility is that we have some horrible state (the namespace in the builder) that allows to submit actions that are functions of something that was computed previously. I guess what we are after is something like:

#providers 
def expensive(param): ...
def expensive2(expensive): ...
#...

api = get_API_somehow()

api.expensive(param='patata')

#what to do here??
api.expensive2()

#A notebook cell or similar
def plot_I_want_to_develop(expensive, expensive2):
   ...

Anyhow, I think I am going to merge this as is.

wilsonmr commented 5 years ago

interesting point, I was always thinking of the really simplistic case of api.provider(**config_params), having said that I think in your example you can do:

#providers 
def expensive(param): ...
def expensive2(expensive): ...
#...

api = get_API_somehow()

exp1 = api.expensive(param='patata')

#what to do here??
exp2 = api.expensive2(expensive=exp1)

#A notebook cell or similar
def plot_I_want_to_develop(expensive=exp1, expensive2=exp2):
   ...

right?

Zaharid commented 5 years ago

You can, but it can get tricky with many chained expensive. Anyhow, I don't know how the interface should look like, and there is some appeal in the thing being completely stateless. In any case if this is going to drive the app at some point, as it should, this point has to e addressed.

Zaharid commented 5 years ago

Maybe something like:

with make_context(api) as ctx:
    e1 = ctx('expensive', param=param)
    e2 = ctx('expensive2')