HK3-Lab-Team / pytrousse

PyTrousse collects into one toolbox a set of data wrangling procedures tailored for composing reproducible analytics pipelines.
Apache License 2.0
0 stars 1 forks source link

FeatureOperation attribute named 'encoder' should be an instance, not a function #12

Closed lorenz-gorini closed 4 years ago

lorenz-gorini commented 4 years ago

FeatureOperation is a class for storing a specific operation (encoding, bin_splitting, ...) applied to DataFrameWithInfo. Its attribute 'encoder' should be used to store an instance of the encoder used for the operation (like sklearn.preprocessing.OneHotEncoder / OrdinalEncoder). This is supposed to be an instance that has been created and its .fit() method has been called.

On the other hand, at the moment, when we create a FeatureOperation instance, the encoder attribute is expected to be one of the possible values listed in the Enum EncodingFunctions, which are the classes and not instances. For the same reason, the __eq__ method of FeatureOperation class mistakenly checks if the encoder is the same between the two FeatureOperation instances, instead of checking if the encoder attributes are instances of the same class.

So I propose to change the type hints, and eq method accordingly.