Closed arita37 closed 3 years ago
I think your code would be cleaner and shorter if using Neuraxle. Something like:
data = load_data()
p = Pipeline([
FillNAValues(),
TruncatedSVD(n_components=n_components, n_iter=7, random_state=42),
YourDataCheckpoint()
])
p.fit(*data)
p.save(...)
@arita37 You would also benefit reading this page: https://www.neuraxle.org/stable/scikit-learn_problems_solutions.html#
Thanks.
Is FillNAValues() a Class or a function ?
How the pipeline is serialized on disk ?
FillNAValues is a classe like that:
import numpy as np
from neuraxle.base import BaseStep, NonFittableMixin
class FillNAValues(NonFittableMixin, BaseStep):
"""
Step that replaces None with default value in numpy arrays.
"""
def __init__(self, default_value):
BaseStep.__init__(self)
NonFittableMixin.__init__(self)
self.default_value = default_value
def transform(self, data_inputs):
new_di = np.where(data_inputs == None, self.default_value, data_inputs)
return new_di
To save the pipeline on disk, we don't have documentation examples right now except the official unit tests of Neuraxle.
Every transformation is a class ?
This is where the problems happen between functional ETL and class based
Check Spark and universal UDF....
On Feb 18, 2020, at 17:33, Guillaume Chevalier notifications@github.com wrote:
FillNAValues is a classe like that:
import numpy as np from neuraxle.base import BaseStep, NonFittableMixin
class FillNAValues(NonFittableMixin, BaseStep): """ Step that replaces None with default value in numpy arrays. """
def __init__( self, default_value ): NonFittableMixin.__init__(self) OutputTransformerMixin.__init__(self) BaseStep.__init__(self) self.default_value = default_value def transform(self, data_inputs): new_di = np.where(data_inputs == None, self.default_value, data_inputs) return new_di
To save the pipeline on disk, we don't have documentation examples right now except the official unit tests of Neuraxle.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.
Every transformation is a class ?
Yes, exactly like in scikit-learn. The transformations can also define a fit
method before the transform
method to learn from the data before doing the transformation. Here in the example of the FillNAValues
, we inherit from also NonFittableMixin
which removes the requirement of defining a fit
method. For instance, a normalizer that would act by fitting a mean and a variance (std) would first need to fit before transforming the data.
The TruncatedSVD
is also a class that can fit and then can transform. Scikit-learn works like that.
Neuraxle is a library for machine learning pipelines, and it is assumed that people can fit anything before transforming data. Some things can fit many times in a row for online learning for instance, which is an improvement over scikit-learn.
this is more of a philosophical question then an issue. neuraxle was made with OO in mind, but it was nice to get your point of view @arita37
What do you think of this simplified pipeline ? https://github.com/arita37/mlmodels/blob/dev/mlmodels/pipeline.py
Everything is serialized on disk.