Open tomeichlersmith opened 2 years ago
More fleshed out base class:
class Parameters :
"""Python configuration class to help validate existence and type
The members of this class and its children are dynamically defined
using the key-word arguments to the constructor. Then any later attempts
to set its members (aka attributes) will fail if the member does not
exist or the new value is the wrong type.
Parameters
----------
kwargs : dict
Parameters and their default values
"""
def __init__(self, **kwargs) :
# explicitly use super here to avoid calling our customized __setattr__
super().__setattr__('__dict__',kwargs)
def __setattr__(self, name, value) :
"""Customize attribute setting mechanism
Parameters
----------
name : str
Name of member attempting to be set (i.e. after the `.`)
value
new value for member (i.e. stuff after `=`)
Raises
------
AttributeError : if 'name' does not exist in the members yet
AttributeError : if 'value' is not the same type as the member
"""
if name in self.__dict__ :
if self.__dict__[name] is None or isinstance(value,type(self.__dict__[name])) :
# default value was None or they are the same instance
self.__dict__[name] = value
else :
raise AttributeError(f'\'{self.__class__.__name__}\' parameter \'{name}\' is of type {type(self.__dict__[name])} and not {type(value)}')
else :
raise AttributeError(f'\'{self.__class__.__name__}\' does not have a parameter named \'{name}\'')
Going to put this on the back-burner for a later major release. It would require updating all of the python modules to convert the old style of setting defaults to the new style of passing them into the super().__init__
call.
I could look at writing a python script which parses the python and converts old into new but that sounds like too much work at the moment.
class MyParams(BaseClass) :
def __init__(self) :
super().__init__(base,req,params)
self.mine = default_value
class MyParams(BaseClass) :
def __init__(self) :
super().__init__(base,req,params,
mine = default_value)
More discussion https://github.com/LDMX-Software/ldmx-sw/issues/1045
@awhitbeck pointed out that __slots__
is a potential alternative. This would allow us to define the names of the class attributes at a class-level which can then be enforced downstream.
The snag that I can think of right now is that the Python-C++ translation relies on pulling the variables from the __dict__
member; however, it may be able to do the exact same process with the __slots__
member.
__slots__
does prevent dynamically defining new attributres; however, it still allows the type to be changed. Still unsure on how we'd extract these variables on the C++ side.
In [1]: class Processor :
...: __slots__ = 'name', 'class_name'
...:
In [2]: class MyProc(Processor) :
...: __slots__ = 'one', 'two'
...:
In [3]: p = MyProc()
In [4]: p.one
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Input In [4], in <cell line: 1>()
----> 1 p.one
AttributeError: 'MyProc' object has no attribute 'one'
In [5]: p.one = 1
In [6]: p.two = 2.
In [7]: p.three = 3
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Input In [7], in <cell line: 1>()
----> 1 p.three = 3
AttributeError: 'MyProc' object has no attribute 'three'
In [8]: p.one = 1.
I think what I want is something to define the boiler plate for me like dataclasses. It'd be sweet to be able to do something like
in config module
@processor('my::Processor','libMyModule.so') # define class and library here
class MyProcessor:
# define parameters and their defaults here
one: int = 1
two: list[float] = [1.0, 2.0]
in config script
from MyModule import MyProcessor
MyProcessor(one = 1.0) # Exception because wrong type
MyProcessor(one = 2) # allowed change of default
p = MyProcessor()
p.one = 2 # change of default
p.three = 3 # exception because non-existent parameter
Now how to get to the point of being able to define the class
decorator processor
.
Got it. After some hacking away to Chappell Roan, I was able to get something operational. Basically, we just hijack dataclass
and use its internal mapping of known fields in order to implement a __setattr__
that validates any input attributes (for existence) and values (for type) before accepting them.
The full test file is copied below including the unittest
I wrote, but the the nice part is that the following syntax is supported.
@parameter_set
class MyParams:
foo: str = 'bar'
@processor("hello", "world")
class MyClass:
one: int = 1
two: float = 2.0
name: str = 'foo'
vec: list = [1, 2, 3]
vec2d: list = [[0.0, 1.0],[-1.0,0.0]]
p: MyParams = MyParams()
c = MyClass(one = 2.0) # fails, wrong type
c = MyClass(dne = 'dne') # fails, parameter doesn't exist
c = MyClass()
c.one = 2.0 # fails wrong type
c.dne = 'dne' # fails, parameter doesn't exist
c.vec = [1.0, 2, 3] # fails, entries of list are wrong type
From #1458 I realized that it would be nice to have a method for specifying legacy/deprecated parameter names. This would include
__setattr__
to forward "old" parameter names to "new" parameter names (including potentially moving to a sub-parameter-set like is done with the logging parameters)Idea outline
@parameter_set
class SubParameters:
param: float = 1.0
@parameter_set
class MyParameters:
new_param1: float = 1.0
new_param2: float = 1.0
sub: SubParameters
__legacy__ = {
'old_param1' : 'new_param1', # simple remap, just would change name silently
'old_param2' : ('new_param1', True), # True signifies that we should add deprecation message
'old_param3': 'sub.param', # should be able to propagate to sub-parameter-sets by splitting on `.` character
}
Maybe just have a separate dunder class variable called __deprecate__
instead of having to support the clunky 2-tuple?
Yes, this seems to be already in a good shape, will you PR it?
I have not gotten around to going through and updating all the downstream python modules which is the main thing preventing me from merging it. It is high on my list since I think parameter mis-spellings is one of the most vexing issues we see on the regular.
Is your feature request related to a problem? Please describe. It is very common to misspell parameters and occasionally have parameters with incorrect types. We can avoid this issue by only allowing the insertion of new parameter names to occur once at class construction.
Describe the solution you'd like I sketched out a solution that will check the existence and type of parameters when they are attempting to be set.
Then child classes would be defined so that their constructor defines the parameters that the class uses.
Loopholes
You can get around this solution by using the hole created for setting the
__dict__
member in the__init__
method.This is helpful to know about in case the user needs to add a parameter that isn't in the python module or misspelled there for some reason. This is also hard enough to do that I don't see someone stumbling upon it.