NNPDF / reportengine

A framework for declarative data analysis
https://data.nnpdf.science/validphys-docs/guide.html
GNU General Public License v2.0
1 stars 2 forks source link

Lockfiles #24

Closed wilsonmr closed 4 years ago

wilsonmr commented 4 years ago

Not sure if I'm quite doing this right

I have teaching this afternoon but thought I'd push what I've done

I'm not quite sure at the moment where these defaults are going to be taken into account when parsing the input

wilsonmr commented 4 years ago

ok I'm not sure why the figure stuff is changed. I took out the previous changes and just added some attributes in init methods of classes. The thing I'm really not sure about is the following:

If we have some record_* function we want to update the lockfile, which I guess starts as an empty dict like object with dict for each of the record_ functions.

Is it sufficient to have something like the mechanism by which we catch produce functions so somewhere inside of _resolve_key we use a get_record_func which then would add to the lockfile with a specific key if not None?

Then as you mention in https://github.com/NNPDF/nnpdf/issues/496#issuecomment-556033703 we use a production rule to actually fill the defaults from either the runcard if it is a lock file or the defaults if not

Sorry for dumb questions I'm struggling to get my head round how this will work in practise

wilsonmr commented 4 years ago

Ok I rebased on master and got rid of some weird changes I accidentally added. I also removed things from environment for time being and added roughly what I was thinking above to config.

This basically wraps record_ functions to fill the input_params (if key doesn't exist i.e not a lockfile itself) Then in the _resolve_key function we raise if there is both a production and record rule (they seem mutually exclusive) and then if there is a record rule and the key is not in the runcard it treats the record rule like a production rule, otherwise just use runcard key.

Worked with a very basic example in validphys:

class CoreConfig(configparser.Config):
...
    def record_foo(self, ):
        """ test record function """
        return {"foo": "bar"}

still not sure if I'm doing this in a really stupid way but thought it was easier to discuss if I implemented something

Zaharid commented 4 years ago

Probably we can chat in person. But I think the basic idea is fine. I would say the lock file starts off as identical to the runcard and the record functions can add additional stuff to it. Then it is up to the production rules to use the recorded information when present, rather than to recompute it. Once the details are ironed out, we can start to think of adding helpers to make is simpler. Speaking of helpers I am working on something that might be useful: https://github.com/Zaharid/validobj/

wilsonmr commented 4 years ago

Ok so to provide an example I added the following the the CoreConfig class in validphys.config:

    def record_foo(self, fooinput):
        # use fooinput to load from defaults file or whatever
        return {fooinput: {"bar": "eggs"}}

I then make the minimal runcard:

fooinput: spam

actions_:
 - foo

Then after running validphys the contents of output/input/lockfile.yaml are as follows:

fooinput: spam

actions_:
- foo
foo_recorded_spec_:
  spam:
    bar: eggs

I guess I should probably add some kind of tests but otherwise I think this is what was required in reportengine

Zaharid commented 4 years ago

The first thing I tried (meaning it's not the simplest possible example) is:

 $ git --no-pager diff
diff --git a/validphys2/src/validphys/config.py b/validphys2/src/validphys/config.py
index a0c9c35d..0957d169 100644
--- a/validphys2/src/validphys/config.py
+++ b/validphys2/src/validphys/config.py
@@ -100,6 +100,13 @@ def _id_with_label(f):

 class CoreConfig(configparser.Config):

+    def parse_foo(self, inp: numbers.Number):
+        return ['foo']*inp
+
+
+    def record_foo(self, foo):
+        return {len(foo): foo[0]}
+

     @property
     def loader(self):
Ns:
    - foo: 4
    - foo: 5

actions_:
    - Ns foo

This fails with a recursion error which I do not immediately understand.

The interface itself looks fine. Surely we can make a decorator that makes the whole thing simpler.

wilsonmr commented 4 years ago

hmm possibly there is something problematic with having both parse_X and record_X

I'll look into the decorator

wilsonmr commented 4 years ago

Not quite sure how to do the wrapper decorator approach, it seems like at some point the decorator would need to be implementation specific? I'm thinking we could have

@record_key
def parse_filter_default(self, inp):
    return inp

where record_key probably wants to return a dictionary, res based on the inp and also update the lockfile with the

filter_defaults_recorded_spec_:
    inp: res

I suppose we could have

class Config(metaclass=ConfigMetaClass):
    _defaults = None

then

def record_key(f):
    @functools.wraps(f)
    def f_(self, *args, **kwargs):
        res = f(self, *args, **kwargs)
        lockkey = trim_token(f.__name__) + "_recorded_spec_"
        if self._defaults is None:
            raise ConfigError("No defaults are set")
        try:
            mapping = self._defaults[res]
        except KeyError as e:
            raise ConfigError(
                f"No defaults exist for key: {res}") from e

        if lockkey not in self.lockfile:
            self.lockfile[lockkey] = {res: mapping}
        else:
            self.lockfile[lockkey].setdefault(res, mapping)
        return {res: mapping}
    return f_

Then when we subclass Config we can either set defaults as a dictionary or load them from a file or something more complicated

Zaharid commented 4 years ago

Hmm. I actually haven't given it that much thought. I'd say this is a generic enough situation:

def get_<some key>_from_application_defaults(<value_hint>: str):
   return <result>

should somehow magically transform into:

def produce_<some_key>(self, <value_hint>: str,  <some_key>_recorded_spec_: Optional[dict]):
   <load from mapping if value_hint is there otherwise compute it from app defaults>

def record_some_key(...):
   <write computed value in lock mapping>

But I'd be inclined to have first some examples implemented manually and then do the magic tricks once we are convinced they are sound.

wilsonmr commented 4 years ago

ok I had a go at the wrapper, with the current HEAD on this branch I made the following additions in validphys:

class CoreConfig(configparser.Config):

    _defaults = {"foo": "bar"}
...
    @record_from_defaults
    def parse_foo(self, fooval):
        return fooval

    def parse_foo_recorded_spec_(self, val):
        return val

    def produce_foo_rules(self, foo, foo_recorded_spec_=None):
        if foo_recorded_spec_:
            print("lockfile")
        else:
            print("runcard")

which seems to work as it should, the only annoying thing is if I don't include the parse_foo_recorded_spec_ it breaks, not sure why.. I think the only difference is that the wrapper should actually return {"foo": "bar"} instead of just "bar" for the purpose of the production rule

wilsonmr commented 4 years ago

Ok I made some changes, now the defaults acts like nested dictionaries, for example:

_defaults = {
        "foo": 
            {
                "eggs": "spam", 
                "4.0": "something"
            }
    }

AFAICT this does what we want now, I could probably document it better. The description of what the wrapper does seems really awkward at present

Zaharid commented 4 years ago

Could we make it into some sort of function call based protocol rather than a property of Config? One might want to initialize the defaults lazily on an as needed basis. They might be expensive to compute (e.g. might require parsing each plotting file for each dataset) and require some optional environment that might not be relevant for every action.

wilsonmr commented 4 years ago

like this? So now I add the following the validphys.config.CoreConfig:

    def load_default_foo(self, spec):
        _defaults = {"eggs": "spam", "4.0": "something"}
        try:
            return _defaults[spec]
        except KeyError as e:
            raise ConfigError(f"No foo defaults for spec: {spec}") from e

    @record_from_defaults
    def parse_foo(self, fooval):
        return fooval

    def produce_foo_rules(self, foo, foo_recorded_spec_=None):
        if foo_recorded_spec_:
            # do something
        else:
            # do something else
wilsonmr commented 4 years ago

but obviously load_default_* can do more expensive things and is only called on a key by key basis

Zaharid commented 4 years ago

I think this is looking good. Will test it ASAP.

Zaharid commented 4 years ago

I'd like to look at this in more detail. I have the impression it could be made simpler.

wilsonmr commented 4 years ago

you're possibly correct, it is a little bit complicated..

Zaharid commented 4 years ago

In particular I don't readily see what are we using the identity function for

On Fri, 31 Jan 2020, 15:33 wilsonmr, notifications@github.com wrote:

you're possibly correct, it is a little bit complicated..

— You are receiving this because your review was requested. Reply to this email directly, view it on GitHub https://github.com/NNPDF/reportengine/pull/24?email_source=notifications&email_token=ABLJWUWWKZOF427YBWSLJJ3RARAGNA5CNFSM4JQCFBP2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEKPALCI#issuecomment-580781449, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABLJWUXE44RGFYRFFLYEPMTRARAGNANCNFSM4JQCFBPQ .

wilsonmr commented 4 years ago

let me know if that works for you, it appears to work for me

Zaharid commented 4 years ago

I am trying to write a test for this and there is something I don't understand: Why is the first parser function rewriting the input?

I.e. I would have imagined that something like this

from reportengine import configparser
from reportengine import api

FILTER_DEFAULTS = {'highpass': [8, 9, 10], 'lowpass': [1, 2, 3]}

class TestConfig(configparser.Config):
    @configparser.record_from_defaults
    def parse_filter_defaults(self, spec):
        return spec

    def load_default_filter_defaults(self, spec):
        return FILTER_DEFAULTS[spec]

    def produce_actual_filter(
        self, filter_defaults, filter_defaults_recorded_spec_=None
    ):
        if filter_defaults_recorded_spec_ is not None:
            return filter_defaults_recorded_spec_[filter_defaults]
        print(filter_defaults)
        return self.load_default_filter_defaults(filter_defaults)

class Providers:
    def data(self, actual_filter):
        return actual_filter

class Env:
    pass

TestAPI = api.API([Providers()], TestConfig, Env)

def test_defaults():
    assert TestAPI.data(filter_defaults='lowpass') == [1, 2, 3]

should work. but instead filter_defaults resolves to filter_defaults = {'lowpass': [1, 2, 3]} Also I would expect filter_defaults_recorded_spec_ to resolve to None in this case, but instead I see that it gets filled with {'lowpass': [1, 2, 3]}

wilsonmr commented 4 years ago

well at the moment the wrapper automatically get's the results for you: https://github.com/NNPDF/reportengine/blob/716a1d744e24e740a1c7d9d441e65d29f394133f/src/reportengine/configparser.py#L188 because we want to save to the lockfile the lowpass and what the resulting defaults were at the time [1, 2, 3]

I guess I thought it would then be easier to return the mapping rather than the input and having to re-evaluate the loaddefault* function but I guess it makes the production rule more awkward

I'm confused re the recordedspec not being None.. perhaps it's because I'm not copying the input params when setting self.lockfile: https://github.com/NNPDF/reportengine/blob/716a1d744e24e740a1c7d9d441e65d29f394133f/src/reportengine/configparser.py#L251 ?

Zaharid commented 4 years ago

On Wed, Feb 19, 2020 at 2:50 PM wilsonmr notifications@github.com wrote:

well at the moment the wrapper automatically get's the results for you: https://github.com/NNPDF/reportengine/blob/716a1d744e24e740a1c7d9d441e65d29f394133f/src/reportengine/configparser.py#L188 because we want to save to the lockfile the lowpass and what the resulting defaults were at the time [1, 2, 3]

I'd say that may be the case internally for working to get the defaults, but then we want to use [1,2,3] rather than the mapping. In practice the only thing one would do with that is next(iter(filter_default.values())). I'd say the parse function should leave its value unmodified (so we can know what the requirement was) and then we should be able to retrieve the computed value from somewhere.

Or at least making the return value a tuple, (spec, value) which is easier to use. But then it seems to me that you don't want to even try to compute the defaults if they are written in the runcard, and that would involve some amount of rearranging...

I guess I thought it would then be easier to return the mapping rather than the input and having to re-evaluate the loaddefault* function but I guess it makes the production rule more awkward

I'm confused re the recordedspec not being None.. perhaps it's because I'm not copying the input params when setting self.lockfile: https://github.com/NNPDF/reportengine/blob/716a1d744e24e740a1c7d9d441e65d29f394133f/src/reportengine/configparser.py#L251 ?

— You are receiving this because your review was requested. Reply to this email directly, view it on GitHub https://github.com/NNPDF/reportengine/pull/24?email_source=notifications&email_token=ABLJWUVC3XIDNL2MRDSI4ILRDVBK3A5CNFSM4JQCFBP2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEMIE6AY#issuecomment-588271363, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABLJWUSQSWWVRQDVULXZLLTRDVBK3ANCNFSM4JQCFBPQ .

wilsonmr commented 4 years ago

actually I don't think it would be so much effort, we already check if *_recorded_spec_ exists in the wrapper so could easily just do that before loading the default, give me a second and I will push a few changes

wilsonmr commented 4 years ago

ok so

should work. but instead filter_defaults resolves to filter_defaults = {'lowpass': [1, 2, 3]} Also I would expect filter_defaults_recorded_spec_ to resolve to None in this case, but instead I see that it gets filled with {'lowpass': [1, 2, 3]}

in the first instance adding a deep copy solves the final issue

from copy import deepcopy
self.lockfile = deepcopy(input_params) 

I used deepcopy because I want to copy either whatever the object is that yaml loads, some kind of mapping or a dictionary in the case that we are using API. Perhaps a shallow copy would be sufficient? Or is there a cleaner way to do this?

Zaharid commented 4 years ago

I'd say a copy is fine. Maybe could be made to look more elegant with some ChainMap shenanigans, but I don't see the point.

On Wed, Feb 19, 2020 at 4:08 PM wilsonmr notifications@github.com wrote:

ok so

should work. but instead filter_defaults resolves to filter_defaults = {'lowpass': [1, 2, 3]} Also I would expect filter_defaults_recordedspec to resolve to None in this case, but instead I see that it gets filled with {'lowpass': [1, 2, 3]}

in the first instance adding a deep copy solves the final issue

from copy import deepcopyself.lockfile = deepcopy(input_params)

I used deepcopy because I want to copy either whatever the object is that yaml loads, some kind of mapping or a dictionary in the case that we are using API. Perhaps a shallow copy would be sufficient? Or is there a cleaner way to do this?

— You are receiving this because your review was requested. Reply to this email directly, view it on GitHub https://github.com/NNPDF/reportengine/pull/24?email_source=notifications&email_token=ABLJWUUO4SGEAMHP3D4DEPDRDVKOLA5CNFSM4JQCFBP2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEMILTNQ#issuecomment-588298678, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABLJWUQWRH2LFQR3K4PZMTTRDVKOLANCNFSM4JQCFBPQ .

wilsonmr commented 4 years ago

is this better?

wilsonmr commented 4 years ago

ok I might be wrong but I would have thought the sensible way to use namespaces with defaults would be like:

nnpdf31_defaults_namespace:
    filter_defaults: "3.1"
some_other_defaults:
    filter_defaults: "4.0"
    data_defaults: "3.0"

actions_:
 - nnpdf31_defaults_namespace fun_action
 - some_other_default fun_action

but in the end there are no complicated namespaces with the defaults we just want to know what 3.1 and 4.0 were at the time, right? If the user is wanting to do something weirder with namespaces then surely they are no longer doing defaults and are instead manually setting various things which is fine but then that's already in the runcard.

Zaharid commented 4 years ago

Fine.

On Wed, 19 Feb 2020, 18:02 wilsonmr, notifications@github.com wrote:

ok I might be wrong but I would have thought the sensible way to use namespaces with defaults would be like:

nnpdf31_defaults_namespace: filter_defaults: "3.1"some_other_defaults: filter_defaults: "4.0" datadefaults: "3.0" actions:

  • nnpdf31_defaults_namespace fun_action
  • some_other_default fun_action

but in the end there are no complicated namespaces with the defaults we just want to know what 3.1 and 4.0 were at the time, right? If the user is wanting to do something weirder with namespaces then surely they are no longer doing defaults and are instead manually setting various things which is fine but then that's already in the runcard.

— You are receiving this because your review was requested. Reply to this email directly, view it on GitHub https://github.com/NNPDF/reportengine/pull/24?email_source=notifications&email_token=ABLJWUXL5NKGQKRZB47NBLTRDVX4ZA5CNFSM4JQCFBP2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEMI2S7Y#issuecomment-588360063, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABLJWUX33PWGE5WLOMLNQZ3RDVX4ZANCNFSM4JQCFBPQ .

Zaharid commented 4 years ago

I am going to merge this so we can start working on it as discussed. It looks good to me as is.

wilsonmr commented 4 years ago

ok cool! I guess even if the underlying mechanism slightly changes, the lockfile format looks fairly stable

wilsonmr commented 4 years ago

@Zaharid with regards to this being merged, are we still waiting for something or should we be using this branch and start making defaults on the NNPDF/nnpdf#651 ?

Zaharid commented 4 years ago

I forgot to click the button!