business-science / pytimetk

Time series easier, faster, more fun. Pytimetk.
https://business-science.github.io/pytimetk/
MIT License
696 stars 60 forks source link

.augment_lags() and .augment_leads() value_column only accepts numeric dtype #295

Closed nauscj closed 4 months ago

nauscj commented 4 months ago

Allow .augments_lags() and .augments_leads() to accept non-numeric dtypes. It is often useful get the lag of a string or even the date_column itself. For example it is often useful to take the time difference between an event and the last time the event occurred in an irregular time series.

import pandas as pd import pytimetk as tk

df = tk.load_dataset('m4_daily', parse_dates=['date'])

df['string_value'] = df['value'].astype(str)

df.augment_lags( date_column='date', value_column='string_value', lags=(1, 7), engine='pandas' )

TypeError: value_column (string_value) is not a numeric dtype.

Lag of string column

df['string_value_shifted'] = ( df.groupby(['id'])['string_value'].shift(1) )

Lag of date column

df['date_shifted'] = ( df.groupby(['id'])['date'].shift(1) )

df

mdancho84 commented 4 months ago

Ok, this is incorporated. The function still requires a date_column to be identified as that's pretty standard with pytimetk. But it now accepts non-numeric dtype.

You can update with this until the next version hits Pypi:

pip install git+https://github.com/business-science/pytimetk.git

Lags

image

Leads

image

nauscj commented 4 months ago

Thanks Matt!