feature-engine / feature_engine

Feature engineering package with sklearn like functionality
https://feature-engine.trainindata.com/
BSD 3-Clause "New" or "Revised" License
1.89k stars 311 forks source link

Add DatetimeOrdinal #818

Open michaelrussell4 opened 2 days ago

michaelrussell4 commented 2 days ago

It'd be nice to have the ability to convert datetime columns to ordinal values. The DatetimeFeatures is nice for when one wants to extract the year, month, day, etc., but sometimes it's desirable to have the date simply as an ordinal value.

Here's an example I've used in code before:

class DatetimeOrdinal(BaseEstimator, TransformerMixin):
    def fit(self, X, y=None):
        return self

    def transform(self, X):
        return X.apply(lambda x: x.map(pd.Timestamp.toordinal, na_action='ignore'))

    def inverse_transform(self, X):
        return X.apply(lambda x: x.map(pd.Timestamp.fromordinal, na_action='ignore'))
solegalli commented 2 days ago

Thank you @michaelrussell4

I wasn't aware of this functionality.

When would it be useful to map dates to ordinal numbers? Do I understand correctly that the cardinality of the variable will still be high after this representation?

michaelrussell4 commented 1 day ago

Here are some reasons why this method can be useful:

The high cardinality of this method does have to be accounted for, probably best with a max-min normalisation (British spelling just for you 😉).

This technique isn't as popular as the DatetimeFeature extraction you currently have but I think it'd be worth considering adding.

solegalli commented 1 day ago

Thank you @michaelrussell4

We'll keep it on the radar :)

If you want to give it a go, you are welcome!

michaelrussell4 commented 22 hours ago

I surely will if I get a chance. Thanks for your work!