LDMX-Software / ldmx-sw

The Light Dark Matter eXperiment simulation and reconstruction framework.
https://ldmx-software.github.io
GNU General Public License v3.0
22 stars 20 forks source link

Python Config Validation #1303

Open tomeichlersmith opened 2 years ago

tomeichlersmith commented 2 years ago

Is your feature request related to a problem? Please describe. It is very common to misspell parameters and occasionally have parameters with incorrect types. We can avoid this issue by only allowing the insertion of new parameter names to occur once at class construction.

Describe the solution you'd like I sketched out a solution that will check the existence and type of parameters when they are attempting to be set.

       : class Base :
    ...:     def __init__(self, **kwargs) :
    ...:         self.__dict__ = kwargs
    ...:     def __setattr__(self, n, v) :
    ...:         if n in self.__dict__ :
    ...:             if isinstance(v, type(self.__dict__[n])) :
    ...:                 self.__dict__[n] = v
    ...:             else :
    ...:                 raise AttributeError(f'{n} was given as type {type(v)} but should be {type(self.__dict__[n])}')
    ...:         elif n == '__dict__' :
    ...:             super().__setattr__(n,v)
    ...:         else :
    ...:             raise AttributeError(f'{n} is not a defined parameter.')

Then child classes would be defined so that their constructor defines the parameters that the class uses.

In [13]: class Child(Base) :
    ...:     def __init__(self) :
    ...:         super().__init__(one = 1, two = 2)
    ...: 

In [14]: c = Child()

In [15]: c.one
Out[15]: 1

In [16]: c.one = 2

In [17]: c.two
Out[17]: 2

In [18]: c.one
Out[18]: 2

In [19]: c.three
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-19-e851540052ba> in <module>
----> 1 c.three

AttributeError: 'Child' object has no attribute 'three'

Loopholes

You can get around this solution by using the hole created for setting the __dict__ member in the __init__ method.

old = c.__dict__
old['three'] = 3
c.__dict__ = old

This is helpful to know about in case the user needs to add a parameter that isn't in the python module or misspelled there for some reason. This is also hard enough to do that I don't see someone stumbling upon it.

tomeichlersmith commented 2 years ago

More fleshed out base class:

class Parameters :
    """Python configuration class to help validate existence and type

    The members of this class and its children are dynamically defined
    using the key-word arguments to the constructor. Then any later attempts
    to set its members (aka attributes) will fail if the member does not
    exist or the new value is the wrong type.

    Parameters
    ----------
    kwargs : dict
        Parameters and their default values
    """

    def __init__(self, **kwargs) :
        # explicitly use super here to avoid calling our customized __setattr__
        super().__setattr__('__dict__',kwargs)

    def __setattr__(self, name, value) :
        """Customize attribute setting mechanism

        Parameters
        ----------
        name : str
            Name of member attempting to be set (i.e. after the `.`)
        value
            new value for member (i.e. stuff after `=`)

        Raises
        ------
        AttributeError : if 'name' does not exist in the members yet
        AttributeError : if 'value' is not the same type as the member
        """

        if name in self.__dict__ :
            if self.__dict__[name] is None or isinstance(value,type(self.__dict__[name])) :
                # default value was None or they are the same instance
                self.__dict__[name] = value
            else :
                raise AttributeError(f'\'{self.__class__.__name__}\' parameter \'{name}\' is of type {type(self.__dict__[name])} and not {type(value)}')
        else :
            raise AttributeError(f'\'{self.__class__.__name__}\' does not have a parameter named \'{name}\'')
tomeichlersmith commented 2 years ago

Going to put this on the back-burner for a later major release. It would require updating all of the python modules to convert the old style of setting defaults to the new style of passing them into the super().__init__ call.

I could look at writing a python script which parses the python and converts old into new but that sounds like too much work at the moment.

Old Style

class MyParams(BaseClass) : 
    def __init__(self) : 
        super().__init__(base,req,params)
        self.mine = default_value

New Style

class MyParams(BaseClass) : 
    def __init__(self) : 
        super().__init__(base,req,params,
          mine = default_value)
tomeichlersmith commented 2 years ago

More discussion https://github.com/LDMX-Software/ldmx-sw/issues/1045

tomeichlersmith commented 2 years ago

@awhitbeck pointed out that __slots__ is a potential alternative. This would allow us to define the names of the class attributes at a class-level which can then be enforced downstream.

The snag that I can think of right now is that the Python-C++ translation relies on pulling the variables from the __dict__ member; however, it may be able to do the exact same process with the __slots__ member.

tomeichlersmith commented 2 years ago

__slots__ does prevent dynamically defining new attributres; however, it still allows the type to be changed. Still unsure on how we'd extract these variables on the C++ side.

In [1]: class Processor :
   ...:     __slots__ = 'name', 'class_name'
   ...: 

In [2]: class MyProc(Processor) :
   ...:     __slots__ = 'one', 'two'
   ...: 

In [3]: p = MyProc()

In [4]: p.one
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Input In [4], in <cell line: 1>()
----> 1 p.one

AttributeError: 'MyProc' object has no attribute 'one'

In [5]: p.one = 1

In [6]: p.two = 2.

In [7]: p.three = 3
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Input In [7], in <cell line: 1>()
----> 1 p.three = 3

AttributeError: 'MyProc' object has no attribute 'three'

In [8]: p.one = 1.
tomeichlersmith commented 2 months ago

I think what I want is something to define the boiler plate for me like dataclasses. It'd be sweet to be able to do something like

in config module

@processor('my::Processor','libMyModule.so') # define class and library here
class MyProcessor:
    # define parameters and their defaults here
    one: int = 1
    two: list[float] = [1.0, 2.0]

in config script

from MyModule import MyProcessor
MyProcessor(one = 1.0) # Exception because wrong type
MyProcessor(one = 2) # allowed change of default
p = MyProcessor()
p.one = 2 # change of default
p.three = 3 # exception because non-existent parameter

Now how to get to the point of being able to define the class decorator processor.

tomeichlersmith commented 2 months ago

Got it. After some hacking away to Chappell Roan, I was able to get something operational. Basically, we just hijack dataclass and use its internal mapping of known fields in order to implement a __setattr__ that validates any input attributes (for existence) and values (for type) before accepting them.

The full test file is copied below including the unittest I wrote, but the the nice part is that the following syntax is supported.


@parameter_set
class MyParams:
    foo: str = 'bar'

@processor("hello", "world")
class MyClass:
    one: int = 1
    two: float = 2.0
    name: str = 'foo'
    vec: list = [1, 2, 3]
    vec2d: list = [[0.0, 1.0],[-1.0,0.0]]
    p: MyParams = MyParams()

c = MyClass(one = 2.0) # fails, wrong type
c = MyClass(dne = 'dne') # fails, parameter doesn't exist
c = MyClass()
c.one = 2.0 # fails wrong type
c.dne = 'dne' # fails, parameter doesn't exist
c.vec = [1.0, 2, 3] # fails, entries of list are wrong type
Full File with Implementation and Testing ```python from dataclasses import dataclass, field def check_list(l, dimension, entry_type): if dimension == 1: for e in l: if not isinstance(e, entry_type): raise TypeError(f'Entry {e} is not of expected type {entry_type}') elif dimension == 2: for ll in l: check_list(ll, 1, entry_type) else: raise Exception(f'Dimension {dimension} not supported.') def validate_and_set_attr(self, attr, val): if attr not in self.__dataclass_fields__: raise KeyError(f'Attribute {attr} not a member of {self.__class__.__name__}') expected_type = self.__dataclass_fields__[attr].type if not isinstance(val, expected_type): raise TypeError(f'Attribute {attr} should be type {expected_type} instead of type {type(val)}.') if isinstance(val, list): expected_dimension = self.__dataclass_fields__[attr].metadata['dimension'] expected_entry_type = self.__dataclass_fields__[attr].metadata['entry_type'] check_list(val, expected_dimension, expected_entry_type) self.__dict__[attr] = val class create_list: def __init__(self, l): self._list = l def __call__(self): return self._list def parameter_set(_class = None, *, post_init = None, **required_parameters): def _decorator_impl(cls): # update post_init function to include library loading if post_init is not None: org_post_init = ( getattr(cls, '__post_init__') if hasattr(cls, '__post_init__') else lambda self: None ) def _full_post_init(self): post_init(self) org_post_init(self) cls.__post_init__ = _full_post_init # update setattr function to validate stuff as well cls.__setattr__ = validate_and_set_attr # add additional annotations that are required for name, value in required_parameters.items(): cls.__annotations__[name] = type(value) setattr(cls, name, field(default = value, init = False)) # process class variables so that mutable classes (mainly list) # are given to dataclass as a default_factory instead of a default for var in vars(cls): # filter out special "dunder" class variables # that start and end with double underscore if var.startswith('__') and var.endswith('__'): continue # only look at variables that have annotations with them # (as is done with dataclass) if var not in cls.__annotations__: continue the_value = getattr(cls, var) if isinstance(the_value, list): entry_type = None if len(the_value) == 0: # warning, unable to deduce type list should be print('WARNING: unable to deduce type list should be.') pass else: dimension = 1 entry_type = type(the_value[0]) if entry_type is list: dimension = 2 if len(the_value[0]) == 0: print('WARNING: unable to deduce type of entries in list') elif isinstance(the_value[0][0], list): raise TypeError('Python configuration only supports up to 2D lists.') else: entry_type = type(the_value[0][0]) check_list(the_value, dimension, entry_type) setattr( cls, var, field( default_factory=create_list(the_value), metadata = { 'entry_type': entry_type, 'dimension': dimension } ) ) return dataclass()(cls) if _class is None: return _decorator_impl else: return _decorator_impl(_class) def processor_post_init(self): if self.moduleName.endswith('.so'): pass def processor(class_name: str, module_name: str): return parameter_set( post_init = processor_post_init, className = class_name, moduleName = module_name ) ######################################################################################### # TESTING @parameter_set class MyParams: foo: str = 'bar' @processor("hello", "world") class MyClass: one: int = 1 two: float = 2.0 name: str = 'foo' vec: list = [1, 2, 3] vec2d: list = [[0.0, 1.0],[-1.0,0.0]] p: MyParams = MyParams() import unittest class TestParameter(unittest.TestCase): def assertListEqual(self, lhs, rhs): self.assertEqual(len(lhs), len(rhs)) for l, r in zip(lhs, rhs): self.assertEqual(l, r) def assertMyClass(self, c, *, one, two, name, vec): self.assertEqual(c.className, 'hello') self.assertEqual(c.one, one) self.assertEqual(c.two, two) self.assertEqual(c.name, name) self.assertListEqual(c.vec, vec) def test_defaults(self): c = MyClass() self.assertMyClass(c, one = 1, two = 2.0, name = 'foo', vec = [1, 2, 3]) def test_change_after_creation(self): c = MyClass() c.one = 2 self.assertMyClass(c, one = 2, two = 2.0, name = 'foo', vec = [1, 2, 3]) c.name = 'bar' self.assertMyClass(c, one = 2, two = 2.0, name = 'bar', vec = [1, 2, 3]) c.two *= 2 self.assertMyClass(c, one = 2, two = 4.0, name = 'bar', vec = [1, 2, 3]) c.vec = [4, 5] self.assertMyClass(c, one = 2, two = 4.0, name = 'bar', vec = [4, 5]) self.assertEqual(c.p.foo, 'bar') c.p.foo = 'baz' self.assertEqual(c.p.foo, 'baz') def test_change_during_creation(self): c = MyClass(one = 2, two = 3.0, name = 'bar', vec = [4, 5], p = MyParams(foo = 'baz')) self.assertMyClass(c, one = 2, two = 3.0, name = 'bar', vec = [4, 5]) self.assertEqual(c.p.foo, 'baz') def test_basic_errors(self): with self.assertRaises(TypeError): MyClass(one = 2.0) with self.assertRaises(TypeError): MyClass(vec = 2.0) with self.assertRaises(TypeError): MyClass(vec = [1.0]) with self.assertRaises(TypeError): MyClass(vec = [1, 1.0]) c = MyClass() with self.assertRaises(TypeError): c.one = 2.0 with self.assertRaises(TypeError): c.vec = 2.0 with self.assertRaises(KeyError): c.dne = 'does not exist' with self.assertRaises(KeyError): c.p.dne = 'does not exist' with self.assertRaises(TypeError): c.p.foo = 1 with self.assertRaises(TypeError): c = MyClass(dne = 'does not exist') with self.assertRaises(TypeError): c = MyClass(p = 1) def test_meta_errors(self): with self.assertRaises(TypeError): @parameter_set() class BadVec: vec: list = [1, 1.0] with self.assertRaises(TypeError): @parameter_set() class Bad2DVec: vec: list = [[2, 3], 1] if __name__ == '__main__': unittest.main() ```
tomeichlersmith commented 1 month ago

From #1458 I realized that it would be nice to have a method for specifying legacy/deprecated parameter names. This would include

Idea outline

@parameter_set
class SubParameters:
    param: float = 1.0

@parameter_set
class MyParameters:
    new_param1: float = 1.0
    new_param2: float = 1.0
    sub: SubParameters
    __legacy__ = {
        'old_param1' : 'new_param1', # simple remap, just would change name silently
        'old_param2' : ('new_param1', True), # True signifies that we should add deprecation message
        'old_param3': 'sub.param', # should be able to propagate to sub-parameter-sets by splitting on `.` character
    }

Maybe just have a separate dunder class variable called __deprecate__ instead of having to support the clunky 2-tuple?

tvami commented 1 month ago

Yes, this seems to be already in a good shape, will you PR it?

tomeichlersmith commented 1 month ago

I have not gotten around to going through and updating all the downstream python modules which is the main thing preventing me from merging it. It is high on my list since I think parameter mis-spellings is one of the most vexing issues we see on the regular.