feature-engine / feature_engine

Feature engineering package with sklearn like functionality
https://feature-engine.trainindata.com/
BSD 3-Clause "New" or "Revised" License
1.8k stars 304 forks source link

Add semi_month to Datetime Features #685

Open candalfigomoro opened 11 months ago

candalfigomoro commented 11 months ago

Is your feature request related to a problem? Please describe. I need to create a "semi month" feature for semi-monthly data. Just like "week" but for semi-months instead of weeks.

Describe the solution you'd like Pandas supports "SM" (semi-montly, i.e. 15th and end of month) and "SMS" (semi-montly start, i.e. 1st and 15th of the month) frequencies, see: https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases

I'd like to have "semi_month" features, for example the 1st of January would be semi_week 1. The 18th of January would be semi_week 2. The 3rd of March would be semi_week 5. Basically you split each month into two semi months and you get 24 semi-months.

In order to be consistent with the SM and SMS pandas frequencies I'd consider:

semi_month_start:

semi_month_end:

So, for example, with "semi_month_start", the 15th of February is semi_month 4, while for "semi_month_end" it is semi_month 3.

Describe alternatives you've considered The only alternative is to have 2 different features for month and day.

P.S.

The feature functions could be something like this:

"semi_month_start": lambda x: (x.dt.day < 15).astype(np.int64) + (x.dt.month - 1) * 2 + 1,
"semi_month_end": lambda x: (x.dt.day <= 15).astype(np.int64) + (x.dt.month - 1) * 2 + 1,
solegalli commented 11 months ago

Thank you for the suggestion @candalfigomoro

If you'd like to make a PR to add the requested functionality, that would be great!

Otherwise, we will pick this up later.

Thank you!