Neuraxio / Neuraxle

The world's cleanest AutoML library ✨ - Do hyperparameter tuning with the right pipeline abstractions to write clean deep learning production pipelines. Let your pipeline steps have hyperparameter spaces. Design steps in your pipeline like components. Compatible with Scikit-Learn, TensorFlow, and most other libraries, frameworks and MLOps environments.
https://www.neuraxle.org/
Apache License 2.0
606 stars 62 forks source link

Question: Simplified pipeline #265

Closed arita37 closed 3 years ago

arita37 commented 4 years ago

What do you think of this simplified pipeline ? https://github.com/arita37/mlmodels/blob/dev/mlmodels/pipeline.py

Everything is serialized on disk.

guillaume-chevalier commented 4 years ago

I think your code would be cleaner and shorter if using Neuraxle. Something like:

data = load_data()

p = Pipeline([
    FillNAValues(), 
    TruncatedSVD(n_components=n_components, n_iter=7, random_state=42), 
    YourDataCheckpoint()
])

p.fit(*data)

p.save(...)
guillaume-chevalier commented 4 years ago

@arita37 You would also benefit reading this page: https://www.neuraxle.org/stable/scikit-learn_problems_solutions.html#

arita37 commented 4 years ago

Thanks.

Is FillNAValues() a Class or a function ?

arita37 commented 4 years ago

How the pipeline is serialized on disk ?

guillaume-chevalier commented 4 years ago

FillNAValues is a classe like that:

import numpy as np
from neuraxle.base import BaseStep, NonFittableMixin

class FillNAValues(NonFittableMixin, BaseStep):
    """
    Step that replaces None with default value in numpy arrays.
    """

    def __init__(self, default_value):
        BaseStep.__init__(self)
        NonFittableMixin.__init__(self)
        self.default_value = default_value

    def transform(self, data_inputs):
        new_di = np.where(data_inputs == None, self.default_value, data_inputs)
        return new_di

To save the pipeline on disk, we don't have documentation examples right now except the official unit tests of Neuraxle.

arita37 commented 4 years ago

Every transformation is a class ?

This is where the problems happen between functional ETL and class based

Check Spark and universal UDF....

On Feb 18, 2020, at 17:33, Guillaume Chevalier notifications@github.com wrote:

 FillNAValues is a classe like that:

import numpy as np from neuraxle.base import BaseStep, NonFittableMixin

class FillNAValues(NonFittableMixin, BaseStep): """ Step that replaces None with default value in numpy arrays. """

def __init__(
    self,
    default_value
):
    NonFittableMixin.__init__(self)
    OutputTransformerMixin.__init__(self)
    BaseStep.__init__(self)
    self.default_value = default_value

def transform(self, data_inputs):
    new_di = np.where(data_inputs == None, self.default_value, data_inputs)

    return new_di

To save the pipeline on disk, we don't have documentation examples right now except the official unit tests of Neuraxle.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

guillaume-chevalier commented 4 years ago

Every transformation is a class ?

Yes, exactly like in scikit-learn. The transformations can also define a fit method before the transform method to learn from the data before doing the transformation. Here in the example of the FillNAValues, we inherit from also NonFittableMixin which removes the requirement of defining a fit method. For instance, a normalizer that would act by fitting a mean and a variance (std) would first need to fit before transforming the data.

The TruncatedSVD is also a class that can fit and then can transform. Scikit-learn works like that.

Neuraxle is a library for machine learning pipelines, and it is assumed that people can fit anything before transforming data. Some things can fit many times in a row for online learning for instance, which is an improvement over scikit-learn.

alexbrillant commented 3 years ago

this is more of a philosophical question then an issue. neuraxle was made with OO in mind, but it was nice to get your point of view @arita37