ContinuumIO / elm

Phase I & part of Phase II of NASA SBIR - Parallel Machine Learning on Satellite Data
http://ensemble-learning-models.readthedocs.io
44 stars 23 forks source link

Estimator serialization - before or after fitting - work with `param` #208

Open PeterDSteinberg opened 7 years ago

PeterDSteinberg commented 7 years ago

Elm PR #192 add elm.mldataset.serialize_mixin for serialization will dill for models that have been initialized and/or fit. We also want to provide a means of plain text (YAML for now) serialization of estimators (and Pipelines) so that we have:

TODO:

PeterDSteinberg commented 7 years ago

This is a rough draft idea of spec for a Pipeline or estimator/transformer:

class SpecMixinBaseEstimator:

    _root = 'elm.pipeline.steps.{}'
    @property
    def spec(self):
        _cls = getattr(self, '_cls', None)
        if not _cls:
            _cls = self.__class__
        name = _cls.__name__
        module = _cls.__module__.split('.')[1]
        return dict(name=_cls.__name__,
                    module=self._root.format(module),
                    params=self.get_params())

    @classmethod
    def from_spec(self, spec):
        modul, name, params = spec['module'], spec['name'], spec['params']
        parts = modul.split('.')
        elm = '.'.join(parts[:-1])
        sk_module = __import__(elm, globals(), locals())
        for p in parts[1:]:
            sk_module = getattr(sk_module, p)
        return getattr(sk_module, name)(**params)

class PipelineSpecMixin(SpecMixinBaseEstimator):

    @property
    def spec(self):
        steps = [[name, step.spec] for name, step in self.steps]
        spec = super(PipelineSpecMixin, self).spec
        spec['steps'] = steps
        return spec

    @classmethod
    def from_spec(self, spec):
        spec = spec.copy()
        from_spec = super(PipelineSpecMixin, self).from_spec
        steps = [[name, from_spec(spec)] for name, spec in spec.pop('steps')]
        return super(PipelineSpecMixin, self).from_spec(**spec)

Those mixins would be used on elm.pipeline.steps.*.* classes and Pipeline, respectively.

PeterDSteinberg commented 7 years ago

Also I wrote this wiki page about the idea (let's adapt that over time or transition it to main docs for Elm as the work is completed).