HK3-Lab-Team / pytrousse

PyTrousse collects into one toolbox a set of data wrangling procedures tailored for composing reproducible analytics pipelines.
Apache License 2.0
0 stars 1 forks source link

Add OneHotEncoder FeatureOperation #98

Closed alessiamarcolini closed 3 years ago

alessiamarcolini commented 3 years ago

fixes #52

lorenz-gorini commented 3 years ago

Sorry. I still need to take a look. After today's meeting, I should be able to do that. I also remembered that if a FeatureOperation has all original columns (columns attribute) that are in metadata_cols, it may be reasonable to automatically include the derived_columns in the metadata_cols attribute of Dataset (columns derived from metadata columns can be considered metadata columns as well, can't they?) . We could discuss about this possibility (this was mostly a reminder for me) :D

alessiamarcolini commented 3 years ago

that if a FeatureOperation has all original columns (columns attribute) that are in metadata_cols, it may be reasonable to automatically include the derived_columns in the metadata_cols attribute of Dataset

we are already doing this here: https://github.com/HK3-Lab-Team/pytrousse/blob/master/src/trousse/dataset.py#L638 😉

lorenz-gorini commented 3 years ago

that if a FeatureOperation has all original columns (columns attribute) that are in metadata_cols, it may be reasonable to automatically include the derived_columns in the metadata_cols attribute of Dataset

we are already doing this here: https://github.com/HK3-Lab-Team/pytrousse/blob/master/src/trousse/dataset.py#L638 wink

Great. One less thing to discuss and do. Only 2999 left!