NNPDF / reportengine

A framework for declarative data analysis
https://data.nnpdf.science/validphys-docs/guide.html
GNU General Public License v2.0
1 stars 2 forks source link

Document dynamic provider dispatch in reportengine #63

Open Zaharid opened 5 years ago

Zaharid commented 5 years ago

The idea is that you can use the explicit_node decorator in the config parser to return functions, which then have the role of providers.

One can do something like:

#provider
def action1(some, required, resources):
     ...

def action2(totally, different, resources):
    ....

def generic_table(dispatch_action):
     #table that works for both the result of action1 and action2
    ...

def generic_plot(generic_table):
   #A plot that works with the output of generic table and might not care about action1 and action2
   ...

#config   
#class Config ...
    @configparser.explicit_node
    def produce_dispatch_action(self, dispatch_value:str):
        if dispatch_value == "action1": 
            return action1
        elif dispatch_value == "action2":
           return action2
        raise ConfigError(...)

Then one could do:

#runcard.yaml
dispatch_value: "action1"
some: ...
required: ...
input_for_resource_called_resources: ...
actions_:
  - generic_plot

This has the crucial advantage that one does not have to redo the whole pipeline (i.e. there is only one generic_table). Runtime dispatch (as in one big substitute of generic_table that takes the inputs of both action1 and action2) will not work well if the various actions have completely different inputs. One disadvantage is that it obfuscates the help.

This should all b written in the guide somehow.

Zaharid commented 5 years ago

cc @RosalynLP

RosalynLP commented 5 years ago

So this would mean we could for e.g. use the theory covmats generated before and not have to run the whole pipeline again?

Zaharid commented 5 years ago

No, this is a totally different thing. This is to avoid having to rewrite many functions that only differ in the names thy take. E.g. if you have total_covmat and theory_covmat and you want to do some complicated analysis with one of them, involving chaining several providers and then you want to do the same analysis for the other, you have to write new versions of all the intermediate and final providers. This help reducing that duplication.

Zaharid commented 5 years ago

As for reusing the covmats, there is nothing preventing you from writing the output to a file and loading it in a production rule. There are many (and more complicated) examples of how to do that in the paramfits module. The problem is that I would have a hard time trusting anything given the amount of bookkeeping it involves. i remember I had the bright idea to remove a "redundant" check for as, and then the first runcard I wrote was loading the wrong file. Having everything computed in one go avoids a whole class of bugs. Of course there is a solution, namely NNPDF/nnpdf#224 (basically load and store entire namespaces), but that is not going to implement itself...

In the meantime it would be interesting to look at this thing:

https://github.com/ebanner/pynt/

RosalynLP commented 5 years ago

So how about if you want to do the same kind of thing multiple times with different inputs? For example, the heat plots.

Could you provide a list of actions under dispatch_action, each corresponding to a different matrix, and then send all these to the final action in turn? There is the complication here that you would also need to somehow supply titles and plot ranges etc. which might differ for each one.