feature-engine / feature_engine

Feature engineering package with sklearn like functionality
https://feature-engine.trainindata.com/
BSD 3-Clause "New" or "Revised" License
1.88k stars 310 forks source link

Avoid Pandas inplaces #624

Closed luismavs closed 1 year ago

luismavs commented 1 year ago

Hi

Parts of the code throw Pandas DeprecationWarnings when executed with current Pandas (>1.5), usually related to inplaces usage. Running pytest locally with py310 and pandas 1.5.3 yielded 4 deprecations warnings and 9 future warnings, some examples:

tests/test_imputation/test_categorical_imputer.py::test_variables_cast_as_category_missing /Users/luis/code/feature_engine/feature_engine/imputation/categorical.py:239: DeprecationWarning: In a future version, df.iloc[:, i] = newvals will attempt to set the values inplace instead of always setting a new array. To retain the old behavior, use either df[df.columns[i]] = newvals or, if columns are non-unique, df.isetitem(i, newvals) X.fillna(self.imputerdict, inplace=True)

tests/test_imputation/test_categorical_imputer.py::test_variables_cast_as_category_missing /Users/luis/code/feature_engine/tests/test_imputation/test_categorical_imputer.py:248: FutureWarning: The inplace parameter in pandas.Categorical.add_categories is deprecated and will be removed in a future version. Removing unused categories will always return a new Categorical object. X_reference["City"].cat.add_categories("Missing", inplace=True)

tests/test_imputation/test_categorical_imputer.py::test_variables_cast_as_category_frequent /Users/luis/code/feature_engine/feature_engine/imputation/base_imputer.py:63: DeprecationWarning: In a future version, df.iloc[:, i] = newvals will attempt to set the values inplace instead of always setting a new array. To retain the old behavior, use either df[df.columns[i]] = newvals or, if columns are non-unique, df.isetitem(i, newvals) X.fillna(value=self.imputerdict, inplace=True)

tests/test_transformation/test_power_transformer.py::test_inverse_transform_exp_no_default[4] /Users/luis/code/feature_engine/featureengine/transformation/power.py:152: DeprecationWarning: In a future version, df.iloc[:, i] = newvals will attempt to set the values inplace instead of always setting a new array. To retain the old behavior, use either df[df.columns[i]] = newvals or, if columns are non-unique, df.isetitem(i, newvals) X.loc[:, self.variables] = np.power(X.loc[:, self.variables_], 1 / self.exp)

tests/test_transformation/test_reciprocal_transformer.py::test_automatically_find_variables /Users/luis/code/feature_engine/featureengine/transformation/reciprocal.py:137: DeprecationWarning: In a future version, df.iloc[:, i] = newvals will attempt to set the values inplace instead of always setting a new array. To retain the old behavior, use either df[df.columns[i]] = newvals or, if columns are non-unique, df.isetitem(i, newvals) X.loc[:, self.variables] = X.loc[:, self.variables_].astype("float")

They seem to be related to inplace behaviour, either implicit or via keyword, which has been deprecated for some of the pandas API calls used.

See this current discussion foe the direction where pandas may be headed, deprecating inplaces:

https://github.com/pandas-dev/pandas/blob/57390ada100466dac777e5b66d5a4f2a72700c38/web/pandas/pdeps/0008-inplace-methods-in-pandas.md

Feature_engine should be updated to use current pandas best practises.

We could start by refactoring the code to remove inplaces in calls to pandas apis where it has already been deprecated in the current version (1.5.3), such as cat.add_categories(); and using best practises such as df.astype({cols_to_cast: new_type}.

solegalli commented 1 year ago

This is the never-ending story with compatibility updates :_(

We should look into this asap.

luismavs commented 1 year ago

Hi,

I can look into this and have a PR soon.

solegalli commented 1 year ago

Awesome! thank you!