Open PeterDSteinberg opened 7 years ago
This is a rough draft idea of spec
for a Pipeline or estimator/transformer:
class SpecMixinBaseEstimator:
_root = 'elm.pipeline.steps.{}'
@property
def spec(self):
_cls = getattr(self, '_cls', None)
if not _cls:
_cls = self.__class__
name = _cls.__name__
module = _cls.__module__.split('.')[1]
return dict(name=_cls.__name__,
module=self._root.format(module),
params=self.get_params())
@classmethod
def from_spec(self, spec):
modul, name, params = spec['module'], spec['name'], spec['params']
parts = modul.split('.')
elm = '.'.join(parts[:-1])
sk_module = __import__(elm, globals(), locals())
for p in parts[1:]:
sk_module = getattr(sk_module, p)
return getattr(sk_module, name)(**params)
class PipelineSpecMixin(SpecMixinBaseEstimator):
@property
def spec(self):
steps = [[name, step.spec] for name, step in self.steps]
spec = super(PipelineSpecMixin, self).spec
spec['steps'] = steps
return spec
@classmethod
def from_spec(self, spec):
spec = spec.copy()
from_spec = super(PipelineSpecMixin, self).from_spec
steps = [[name, from_spec(spec)] for name, spec in spec.pop('steps')]
return super(PipelineSpecMixin, self).from_spec(**spec)
Those mixins would be used on elm.pipeline.steps.*.*
classes and Pipeline
, respectively.
Also I wrote this wiki page about the idea (let's adapt that over time or transition it to main docs for Elm as the work is completed).
Elm PR #192 add
elm.mldataset.serialize_mixin
for serialization willdill
for models that have been initialized and/or fit. We also want to provide a means of plain text (YAML for now) serialization of estimators (and Pipelines) so that we have:param
data structures.param
allows input validation and structuring of inputs for UIs in Bokeh and other tools, as needed in later UI related work of Phase I. Feel free to separate thisparam
spec work to separate issues/PRs as the work gets done.TODO:
to_spec
andfrom_spec
method for each estimator, where to/from spec means to return/read a text specification of an estimator, yaml format by defaultfunc
: Callable such aselm.pipeline.steps.linear_model.LinearRegression
args
: List - positional arguments tofunc
kwargs
: Keyword arguments tofunc
, such asfit_intercept
as in the example - kwargs that can go toset_params
or__init__
offunc
BaseEstimator
-like in inheritance andBaseComposition
-like separately, i.e. most estimators/transformers have a common base callable for how they do to/from spec and special cases likeEaSearchCV
,Pipeline
, and others are handled separately.