Closed andmib closed 5 years ago
Hi @andmib
Thanks for using pyaf. You are having an issue with computing "Proportions of the historical averages" in a top-down approach for a hierarchical model.
Is it possible to share your model, with anonymized data ? This will allow me to solve the issue for you and all other pyaf users. This is my preferred way (creating an issue pointed you to this).
Otherwise, you can still try to solve the problem on your own , by protecting the division by zero. You can, for example, dump the pandas dataframe lEstim in a csv file and print (col1 , col) to see f the mean is really zero.
@antoinecarme Thanks for the quick response! Unfortunately I wouldn't feel comfortable trying to anonymize the data and send it over, so I'm going to try your latter suggestion of dumping IEstim into a CSV and seeing where the mean is zero. I will keep you updated!
Thanks. Your feedback is welcome. I am interested if you can give a sketch of the diagnosis to be able to reproduce it on an artificial dataset/hierarchy.
@antoinecarme Looks like it is simply a sparsity issue - I have many groups, when split into train/test, are very sparse. Looks like I just need to drop those groups out, because they are 0 in lEstim.
@andmib
Good point. Did not think of sparsity, especially when the groups are generated by pyaf.
Pyaf can probably take care of discarding such groups automatically.
@andmib
Good point. Did not think of sparsity, especially when the groups are generated by pyaf.
1. What type of hierarchy are you using ? 2. Does this happen on intermediate hierarchical levels/signals ?
Very interesting. Can you please share the pickled model (not the data) ?
Pyaf can probably take care of discarding such groups automatically.
But let's say you have two groups - wine-red-america
and wine-red-europe
, where wine-red-europe
is super sparse or close to 0. You may still want a (probably bad) forecast for wine-red-europe
based off the wine-red
grouping?
Very interesting. Can you please share the pickled model (not the data) ?
I haven't gotten to a successfully trained model yet ;) Keeps failing with zero division error, so I'm dropping groups in my original data based on some threshold of having data prior to using pyaf
.
Yes. You are right. But needs to be investigated.
you can pickle the model before training. That's already good for me (and safe for you ;).
you can pickle the model before training. That's already good for me (and safe for you ;).
Can you show me at what point I pickle the model in the example "grouped" model ipynb
?
Just before the last line here :
Just before the last line here :
So joblib.dump(lEngine, 'pickledfile.pkl')
? It's failing right at the .train
line.
Did you add
joblib.dump(lEngine, 'pickledfile.pkl')
before
lSignalHierarchy = lEngine.train(train_df , lDateColumn, lSignalVar, 1, lHierarchy, None)
??
Did you add
joblib.dump(lEngine, 'pickledfile.pkl')
before
lSignalHierarchy = lEngine.train(train_df , lDateColumn, lSignalVar, 1, lHierarchy, None)
??
No, that's what I'm asking - which object do you want me to pickle?
lEngine
lEngine
Do you have an email address/box I can shoot this over to you?
Sorry, will need also the pickle of another object : lHierarchy.
My public email is on the top of this file :
https://github.com/antoinecarme/pyaf/blob/master/TS/Signal_Grouping.py
Sorry, will need also the pickle of another object : lHierarchy.
My public email is on the top of this file :
https://github.com/antoinecarme/pyaf/blob/master/TS/Signal_Grouping.py
Sent. But I did get this working. I just needed to drop groups with too few observations - in your example, that would be columns in train_df
that are almost all 0.
received your message.
Thanks. Will keep this page updated when something changes. Leaving.
Nice to hear that's OK for you. You can close this issue if you want.
Nice to hear that's OK for you. You can close this issue if you want.
Will do - this is a very helpful package by the way, and I will keep you updated.
Hello,
I have some sparse hierarchical data that I am running through pyaf. None of my individual timeseries are entirely 0, yet I'm getting a divide by zero error when trying to run
lEngine.train
on my dataset.Any ideas as to what this might be or how to debug the issue in my dataset or the code itself?