HK3-Lab-Team / pytrousse

PyTrousse collects into one toolbox a set of data wrangling procedures tailored for composing reproducible analytics pipelines.
Apache License 2.0
0 stars 1 forks source link

Better column_list_by_type performances with tests and dataframewithinfo docstrings #3

Closed lorenz-gorini closed 4 years ago

lorenz-gorini commented 4 years ago

Refactored lots of docstrings to a common standard and added explanations. Reimplemented column_list_by_type method of DataFrameWithInfo class to speed up performances. Added few checks when using find_operation_in_column so that, in case more than one operation is found, an error is raised. Changed default values for FeatureOperation class

Added a DataFrame mock with various dtypes, and another that relates column names with the related column type. Added SeriesMock.series_by_type for generating pandas Series with various types to cover many scenarios according to pandas supported dtypes (see docs https://pandas.pydata.org/pandas-docs/stable/user_guide/basics.html#dtypes ). Added test for _find_columns_by_type and _split_columns_by_type_parallel functions from dataframe_with_info.py script (related to column_list_by_type, property of DataFrameWithInfo)