antoinecarme / pyaf

PyAF is an Open Source Python library for Automatic Time Series Forecasting built on top of popular pydata modules.
BSD 3-Clause "New" or "Revised" License
458 stars 73 forks source link

Zero division error - cannot figure out source. #111

Closed andmib closed 5 years ago

andmib commented 5 years ago

Hello,

I have some sparse hierarchical data that I am running through pyaf. None of my individual timeseries are entirely 0, yet I'm getting a divide by zero error when trying to run lEngine.train on my dataset.

---------------------------------------------------------------------------
ZeroDivisionError                         Traceback (most recent call last)
~/anaconda3/envs/tf-gpu/lib/python3.6/site-packages/pyaf/HierarchicalForecastEngine.py in train(self, iInputDS, iTime, iSignal, iHorizon, iHierarchy, iExogenousData)
     22         try:
---> 23             self.train_HierarchicalModel(iInputDS, iTime, iSignal, iHorizon, iHierarchy, iExogenousData);
     24         except tsutil.PyAF_Error as error:

~/anaconda3/envs/tf-gpu/lib/python3.6/site-packages/pyaf/HierarchicalForecastEngine.py in train_HierarchicalModel(self, iInputDS, iTime, iSignal, iHorizon, iHierarchy, iExogenousData)
     93         self.mSignalHierarchy = lSignalHierarchy;
---> 94         self.mSignalHierarchy.fit();
     95 

~/anaconda3/envs/tf-gpu/lib/python3.6/site-packages/pyaf/TS/SignalHierarchy.py in fit(self)
    186         self.create_all_levels_models(lAllLevelsDataset, self.mHorizon, self.mDateColumn);
--> 187         self.computeTopDownHistoricalProportions(lAllLevelsDataset);
    188         lForecast_DF = self.internal_forecast(self.mTrainingDataset , self.mHorizon)

~/anaconda3/envs/tf-gpu/lib/python3.6/site-packages/pyaf/TS/SignalHierarchy.py in computeTopDownHistoricalProportions(self, iAllLevelsDataset)
    273                         self.mAvgHistProp[col][col1] = (lEstim[col1] / lEstim[col]).mean();
--> 274                         self.mPropHistAvg[col][col1] = lEstim[col1].mean() / lEstim[col].mean();
    275         # print("AvgHitProp\n", self.mAvgHistProp);

ZeroDivisionError: float division by zero

Any ideas as to what this might be or how to debug the issue in my dataset or the code itself?

antoinecarme commented 5 years ago

Hi @andmib

Thanks for using pyaf. You are having an issue with computing "Proportions of the historical averages" in a top-down approach for a hierarchical model.

Is it possible to share your model, with anonymized data ? This will allow me to solve the issue for you and all other pyaf users. This is my preferred way (creating an issue pointed you to this).

Otherwise, you can still try to solve the problem on your own , by protecting the division by zero. You can, for example, dump the pandas dataframe lEstim in a csv file and print (col1 , col) to see f the mean is really zero.

andmib commented 5 years ago

@antoinecarme Thanks for the quick response! Unfortunately I wouldn't feel comfortable trying to anonymize the data and send it over, so I'm going to try your latter suggestion of dumping IEstim into a CSV and seeing where the mean is zero. I will keep you updated!

antoinecarme commented 5 years ago

Thanks. Your feedback is welcome. I am interested if you can give a sketch of the diagnosis to be able to reproduce it on an artificial dataset/hierarchy.

andmib commented 5 years ago

@antoinecarme Looks like it is simply a sparsity issue - I have many groups, when split into train/test, are very sparse. Looks like I just need to drop those groups out, because they are 0 in lEstim.

antoinecarme commented 5 years ago

@andmib

Good point. Did not think of sparsity, especially when the groups are generated by pyaf.

  1. What type of hierarchy are you using ?
  2. Does this happen on intermediate hierarchical levels/signals ?
antoinecarme commented 5 years ago

Pyaf can probably take care of discarding such groups automatically.

andmib commented 5 years ago

@andmib

Good point. Did not think of sparsity, especially when the groups are generated by pyaf.

1. What type of hierarchy are you using ?

2. Does this happen on intermediate hierarchical  levels/signals ?
  1. Grouped, based on the grouped example ipython notebook.
  2. Only the most granular level - at higher levels of the hierarchy, the mean is not 0 (but close on some).
antoinecarme commented 5 years ago

Very interesting. Can you please share the pickled model (not the data) ?

andmib commented 5 years ago

Pyaf can probably take care of discarding such groups automatically.

But let's say you have two groups - wine-red-america and wine-red-europe, where wine-red-europe is super sparse or close to 0. You may still want a (probably bad) forecast for wine-red-europe based off the wine-red grouping?

andmib commented 5 years ago

Very interesting. Can you please share the pickled model (not the data) ?

I haven't gotten to a successfully trained model yet ;) Keeps failing with zero division error, so I'm dropping groups in my original data based on some threshold of having data prior to using pyaf.

antoinecarme commented 5 years ago

Yes. You are right. But needs to be investigated.

antoinecarme commented 5 years ago

you can pickle the model before training. That's already good for me (and safe for you ;).

andmib commented 5 years ago

you can pickle the model before training. That's already good for me (and safe for you ;).

Can you show me at what point I pickle the model in the example "grouped" model ipynb?

antoinecarme commented 5 years ago

Just before the last line here :

image

andmib commented 5 years ago

Just before the last line here :

image

So joblib.dump(lEngine, 'pickledfile.pkl')? It's failing right at the .train line.

antoinecarme commented 5 years ago

Did you add

joblib.dump(lEngine, 'pickledfile.pkl')

before

lSignalHierarchy = lEngine.train(train_df , lDateColumn, lSignalVar, 1, lHierarchy, None)

??

andmib commented 5 years ago

Did you add

joblib.dump(lEngine, 'pickledfile.pkl')

before

lSignalHierarchy = lEngine.train(train_df , lDateColumn, lSignalVar, 1, lHierarchy, None)

??

No, that's what I'm asking - which object do you want me to pickle?

antoinecarme commented 5 years ago

lEngine

andmib commented 5 years ago

lEngine

Do you have an email address/box I can shoot this over to you?

antoinecarme commented 5 years ago

Sorry, will need also the pickle of another object : lHierarchy.

My public email is on the top of this file :

https://github.com/antoinecarme/pyaf/blob/master/TS/Signal_Grouping.py

andmib commented 5 years ago

Sorry, will need also the pickle of another object : lHierarchy.

My public email is on the top of this file :

https://github.com/antoinecarme/pyaf/blob/master/TS/Signal_Grouping.py

Sent. But I did get this working. I just needed to drop groups with too few observations - in your example, that would be columns in train_df that are almost all 0.

antoinecarme commented 5 years ago

received your message.

Thanks. Will keep this page updated when something changes. Leaving.

antoinecarme commented 5 years ago

Nice to hear that's OK for you. You can close this issue if you want.

andmib commented 5 years ago

Nice to hear that's OK for you. You can close this issue if you want.

Will do - this is a very helpful package by the way, and I will keep you updated.