Closed lorenz-gorini closed 3 years ago
During analysis pytrousse should not perform conversion to category (this would make the column a pd.Categorical instance, instead of pd.Series, and these are not a numpy arrays).
I am not sure that this should be a concern of pytrousse
.. shouldn't it be handled by who is using the data after?
During analysis pytrousse should not perform conversion to category (this would make the column a pd.Categorical instance, instead of pd.Series, and these are not a numpy arrays).
I am not sure that this should be a concern of
pytrousse
.. shouldn't it be handled by who is using the data after?
I am not sure, because since the conversion makes categorical data different from the other data structures, it could be better that this conversion was performed through a FeatureOperation so that the user is fully aware.
In my personal opinion this points out two levels of issues that need to be taken care of:
(A) Operations like pd.Categorical
(or astype("category")
should be part of FeatureOperation
otherwise it won't be recorded.
(B) In terms of API and processing: you should not rely on the return type depending this is Categorical
or Series
.
This is simply bad practice.
You should be using to_numpy
instead to make sure that you're dealing with numpy
arrays, whenever you will need/expect to.
my2c
In my personal opinion this points out two levels of issues that need to be taken care of:
* (A) Operations like `pd.Categorical` (or `astype("category")` should be part of `FeatureOperation` otherwise it won't be recorded. * (B) In terms of API and processing: you should not rely on the return type depending this is `Categorical` or `Series`. This is simply bad practice. You should be using `to_numpy` instead to make sure that you're dealing with `numpy` arrays, whenever you will need/expect to.
my2c
Right. Thanks! Infact during Reference Interval model computation, I was using .values
to get numpy array but pd.Categorical
has not this attribute. Instead to_numpy()
works on pd.Series
and pd.Categorical
Removed column conversion to category During analysis pytrousse should not perform conversion to category (this would make the column a pd.Categorical instance, instead of pd.Series, and these are not a numpy arrays).
Replaced hardcoded values with arguments for
breed_specific_bin_splitting