antoinecarme / pyaf

PyAF is an Open Source Python library for Automatic Time Series Forecasting built on top of popular pydata modules.
BSD 3-Clause "New" or "Revised" License
459 stars 72 forks source link

Resolving differing forecasts for items belonging to multiple groups #65

Closed mgiangreco closed 7 years ago

mgiangreco commented 7 years ago

Let's say you have item1, which belongs to two groups: group1 and group2.

The columns of data are then: DateColumn, item1_group1, item1_group2

Training a hierarchical model on this data results in a forecast with columns like this: DateColumn, item1_group1_Forecast, item1_group2_Forecast

But we just want a single forecast column for item1. How do we resolve this?

antoinecarme commented 7 years ago

Not sure if I understand you well. I will need more feedback:

  1. more data column semantics. adding an item2 to your example can make things more clear.
  2. Hierarchy description : plots ? jupyter notebook with anonymized data
  3. Not sure if list of forecast columns you gave is complete.

The grouping order can be changed to produce an __item1_Forecast that can be relevant (one can experiment with group1_item2 instead of item1_group2 ;).

mgiangreco commented 7 years ago

Here's a toy example:

https://github.com/mgiangreco/pyaf_demos/blob/master/grouping_demo.ipynb

You can see in the last cell that the '2017-01-06' forecasts for 'item1_group1_OC_Forecast' and 'item1_group2_OC_Forecast' are slightly different, even though the forecasts are for the same item. My question is how to resolve this discrepancy.

antoinecarme commented 7 years ago

Here, are you asking why item1 forecast is different from a group to another, even if item1 is the same column in both groups ?

Hierarchical forecasts are different for both groups (item1 is not grouped the same way). That's normal. unless both groups are also identical.

As an example, Monday forecast will be different if you group it with week days or month days.

Probably , I did not understand everything here.

mgiangreco commented 7 years ago

Yes I think you understood my question. This is interesting:

"Monday forecast will be different if you group it with week days or month days."

To use the example provided in the Hydman/Athanasopoulos text:

"...series can be naturally grouped together based on attributes without necessarily imposing a hierarchical structure. For example the bicycles sold by the warehouse can be for males, females or unisex. They can be used for racing, commuting or recreational purposes. They can be single speed or have multiple gears. Frames can be carbon, aluminium or steel."

The forecast for bicycles sold by the warehouse would be different, depending on which group (male vs. female vs. unisex, racing vs. commuting vs. recreational, single speed vs. multiple gears, carbon vs. aluminum vs. steel) is chosen.

It seems like what we would normally want in this case is not 4 separate forecasts, but rather one single forecast for bicycles sold that reconciles all of the data from the non-hierarchical groups to which the bicycle belongs. But I understand that this may be a limitation of the chosen method, rather than of your implementation, so I will close the issue.

antoinecarme commented 7 years ago

It is probably a semantic issue (choice of grouping and grouping order). There is a lot of possible groupings.

This is something that one cannot transmit through a toy example.