antoinecarme / pyaf

PyAF is an Open Source Python library for Automatic Time Series Forecasting built on top of popular pydata modules.
BSD 3-Clause "New" or "Revised" License
459 stars 72 forks source link

Investigate business days / hours impact #210

Open antoinecarme opened 2 years ago

antoinecarme commented 2 years ago

Investigate business days / hours impact.

Impact : better handling of irregular physical time stamps (business dates). Incremental. May have a workaround with non-physical time equivalent.

  1. PyAF models do not take into account the weekends/lunch time when computing future dates.
  2. This can have some impact on date-dependent time-series models
  3. no impact on signal transformation
  4. Impact on time-based trends (linear , polynomial, etc). No impact on other/stochastic trends (lag1, etc)
  5. Impact on seasonal values (for DayOfWeek , will the t+3 be a Monday or a Thursday ?)
  6. No impact on AR-like models (previous date can skip week-ends).
  7. Business hours should be investigated further (next business hour skips lunch ;).
  8. The time column will be smarted and generate more business-friendly forecast dates and forecast values. Model explanation improves.

Implementation impacts :

  1. The delta of the dates is computed as the mean difference between two consecutive dates. the notion of "consecutive" will be impacted. Is most frequent diff better than average diff in this case?
  2. the next date value will skip some intermediate "non-business" values.
  3. For the tests, we can use pd.date_range and pd.bdate_range as time values
    
    # 1000 consecutive business days
    pd.bdate_range('2000-1-1', periods=1000)

1000 consecutive business hours

pd.bdate_range('2000-1-1', periods=1000, freq = 'BH')


4. Impact on plots ? 
5. Activate by default ? 
6. Automatic detection based on HourOfWeek/DayOfWeek distribution ?
7. Use pd.bdate_range implementation to compute the next date ?
8. Not sure if this is not dependent on the locale/country/culture etc ...
antoinecarme commented 1 year ago

next business day

import pandas  as pd
>>> pd.bdate_range('2000-1-1', periods=2)
DatetimeIndex(['2000-01-03', '2000-01-04'], dtype='datetime64[ns]', freq='B')
>>> pd.bdate_range('2000-1-2', periods=2)
DatetimeIndex(['2000-01-03', '2000-01-04'], dtype='datetime64[ns]', freq='B')
>>> pd.bdate_range('2000-1-3', periods=2)
DatetimeIndex(['2000-01-03', '2000-01-04'], dtype='datetime64[ns]', freq='B')
>>> pd.bdate_range('2000-1-4', periods=2)
DatetimeIndex(['2000-01-04', '2000-01-05'], dtype='datetime64[ns]', freq='B')

>>> lTwoNextBusinessDays = pd.bdate_range('2000-1-1', periods=2)
>>> lTwoNextBusinessDays[0]
Timestamp('2000-01-03 00:00:00', freq='B')
antoinecarme commented 1 year ago
>>> import pandas  as pd
>>> def next_business_day(x):
...     lNextTwoBusinessDays = pd.bdate_range(x, periods=2)
...     lDays = [d for d in lNextTwoBusinessDays if (d > pd.Timestamp(x))]
...     return lDays[0]
... 
>>> next_business_day('2000-1-1')
Timestamp('2000-01-03 00:00:00', freq='B')
>>> next_business_day('2000-1-2')
Timestamp('2000-01-03 00:00:00', freq='B')
>>> next_business_day('2000-1-3')
Timestamp('2000-01-04 00:00:00', freq='B')
>>> next_business_day('2000-1-4')
Timestamp('2000-01-05 00:00:00', freq='B')
>>> next_business_day('2000-1-5')
Timestamp('2000-01-06 00:00:00', freq='B')
>>> next_business_day('2000-1-6')
Timestamp('2000-01-07 00:00:00', freq='B')
>>> next_business_day('2000-1-7')
Timestamp('2000-01-10 00:00:00', freq='B')
>>> next_business_day('2000-1-8')
Timestamp('2000-01-10 00:00:00', freq='B')
>>> next_business_day('2000-1-9')
Timestamp('2000-01-10 00:00:00', freq='B')
>>> next_business_day('2000-1-10')
Timestamp('2000-01-11 00:00:00', freq='B')
antoinecarme commented 1 year ago
>>> import pandas  as pd
>>> 
>>> def next_business_hour(x):
...     lNextTwoBusinessHours = pd.date_range(x, periods=2, freq = 'BH')
...     lHours = [h for h in lNextTwoBusinessHours if (h > pd.Timestamp(x))]
...     print("next_business_hour" , (x , lHours[0]))
...     return lHours[0]
... 
>>> next_business_hour('2000-1-10 08:00:00')
next_business_hour ('2000-1-10 08:00:00', Timestamp('2000-01-10 09:00:00', freq='BH'))
Timestamp('2000-01-10 09:00:00', freq='BH')
>>> next_business_hour('2000-1-10 09:00:00')
next_business_hour ('2000-1-10 09:00:00', Timestamp('2000-01-10 10:00:00', freq='BH'))
Timestamp('2000-01-10 10:00:00', freq='BH')
>>> next_business_hour('2000-1-10 10:00:00')
next_business_hour ('2000-1-10 10:00:00', Timestamp('2000-01-10 11:00:00', freq='BH'))
Timestamp('2000-01-10 11:00:00', freq='BH')
>>> next_business_hour('2000-1-10 11:00:00')
next_business_hour ('2000-1-10 11:00:00', Timestamp('2000-01-10 12:00:00', freq='BH'))
Timestamp('2000-01-10 12:00:00', freq='BH')
>>> next_business_hour('2000-1-10 12:00:00')
next_business_hour ('2000-1-10 12:00:00', Timestamp('2000-01-10 13:00:00', freq='BH'))
Timestamp('2000-01-10 13:00:00', freq='BH')
>>> next_business_hour('2000-1-10 12:03:00')
next_business_hour ('2000-1-10 12:03:00', Timestamp('2000-01-10 13:03:00', freq='BH'))
Timestamp('2000-01-10 13:03:00', freq='BH')
>>> next_business_hour('2000-1-10 13:00:00')
next_business_hour ('2000-1-10 13:00:00', Timestamp('2000-01-10 14:00:00', freq='BH'))
Timestamp('2000-01-10 14:00:00', freq='BH')
>>> next_business_hour('2000-1-10 22:00:00')
next_business_hour ('2000-1-10 22:00:00', Timestamp('2000-01-11 09:00:00', freq='BH'))
Timestamp('2000-01-11 09:00:00', freq='BH')
>>> next_business_hour('2000-1-10 23:00:00')
next_business_hour ('2000-1-10 23:00:00', Timestamp('2000-01-11 09:00:00', freq='BH'))
Timestamp('2000-01-11 09:00:00', freq='BH')
>>> next_business_hour('2000-1-10 00:00:00')
next_business_hour ('2000-1-10 00:00:00', Timestamp('2000-01-10 09:00:00', freq='BH'))
Timestamp('2000-01-10 09:00:00', freq='BH')
>>> 
antoinecarme commented 1 year ago

Not sure if this feature will be implemented. User value ?

Delayed. Priority : low