Closed jtfields closed 4 years ago
Hi, because AtsPy is currently aimed at univariate forecasts, it is possible to do what you said, but the problem is that your multiple time series won't learn from each other when you are forecasting their future values. In which case you might have to go to GluonTS directly. See the solution here, https://github.com/awslabs/gluon-ts/issues/190.
If you do want to go ahead, for an efficient solution you might want to use parallel processing, https://stackoverflow.com/questions/9786102/how-do-i-parallelize-a-simple-python-loop in a for loop, if you want to save the results after each loop you can just pickle it, or you can load the results into a dictionary and pickle that.
pickle.dump(dict_in, open("save.p", "wb"))
dict_out = pickle.load(open("save.p", "rb"))
I used the .map function instead a for loop since it was faster: subset["forecast"] = subset["RegionName"].map(Forecast)
ValueError Traceback (most recent call last)
Here is the code... def Forecast(Zip): zillowByZip = zillowUSA97to17.loc[zillowUSA97to17['RegionName']==Zip] if len(zillowByZip) < 3: return None elif len(zillowByZip) > 3: zillowByZip = zillowByZip[['Value', 'Date']] zillowByZip.Date = pd.to_datetime(zillowByZip.Date) zillowByZip = zillowByZip.set_index("Date") model_list=["Prophet"] am = AutomatedModel(df = zillowByZip, model_list=model_list, season="infer_from_data",forecast_len=60) forecast_in, performance = am.forecast_insample() forecast_out = am.forecast_outsample() all_ensemble_in, all_ensemble_out, all_performance = am.ensemble(forecast_in, forecast_out) forecast_out.head() performance all_performance all_ensemble_in[["Target","Prophet"]].plot() all_ensemble_in all_ensemble_out all_ensemble_out[["Prophet"]].plot() am.models_dict_in am.models_dict_out
subset = zillowUSA97to17.loc[zillowUSA97to17['State']=='MD']
subset["forecast"] = subset["RegionName"].map(Forecast)
I did some more tests and the state of Maryland errors out after 5 zip codes. The state of Maine errors out after about 25 zip codes. It seems like there is some value in AtsPy that I need to reset prior to each loop. Can anyone provide some guidance on this?
I'm working on a project to predict the top three zip codes in the US for increases in housing prices. I used AtsPy to predict the price for one zip code (53012) and now I want to loop over 15,000 zip codes. I'm looking for suggestions for how to do this most efficiently and ways to save the results for each loop. I know this is more of a "how to use" question than an issue. I searched Stack Overflow and since AtsPy is so new there are no posts related to it yet. Thanks again for a great new package for Python!
zillowByZip = zillowUSA1997to2020.loc[zillowUSA1997to2020['ZipCode']==53012] zillowByZip = zillowByZip[['Value', 'Date']] zillowByZip.Date = pd.to_datetime(zillowByZip.Date) zillowByZip = zillowByZip.set_index("Date") model_list=["Gluonts"] am = AutomatedModel(df = zillowByZip, model_list=model_list, season="infer_from_data",forecast_len=60) forecast_in, performance = am.forecast_insample() forecast_out = am.forecast_outsample() all_ensemble_in, all_ensemble_out, all_performance = am.ensemble(forecast_in, forecast_out) forecast_out.head() performance all_performance all_ensemble_in[["Target","Gluonts"]].plot() all_ensemble_in all_ensemble_out all_ensemble_out[["Gluonts"]].plot() am.models_dict_in am.models_dict_out