Closed pcfierro closed 4 years ago
sent data too:
@pcfierro
Thanks for using PyAF.
PyAF is an automatic/modular process that can be used to get some mechanical form of forecast.
It is not adjustable to match something someone (even myself) can like using a kind of non-computational quality measure. It is simply not expected nor designed to do that.
I had a look at the zip file including two datasets. I don't have access to the details of the training process (horizon etc). Can you please provide with a python script that you use to perform the training process ?
Of course, you can always perform some kind of preprocessing on the signal (remove the two lines with outliers etc) before using PyAF. The more regular the signal, the better.
Sure I can, paste the code in this reply. the Area is most concerning when it is in TREND, while the angle may have some wind based seasonality from minute to minute but basically holding at a moving average.
ANGLE Measure - AR with seasonality - ExponetialSmoothing function in tsa statsmodels AREA Measure - Trend without seasonality, in both cases I need a confidence interval
fcstWin = 60 # 3 Hours
df = pd.read_csv(csvAreaFile2, sep=r',', engine='python', skiprows=0); df.columns = ['fDateO','area'] df['fDate'] = range(df.shape[0]); print(df.head()); lDateVar = 'fDate' lSignalVar = 'area' lEngine = autof.cForecastEngine() lEngine.train(iInputDS = df , iTime='fDate', iSignal = 'area', iHorizon = fcstWin); lEngine.getModelInfo() # => relative error 7% (MAPE) df_forecast = lEngine.forecast(iInputDS = df , iHorizon = fcstWin) print(df_forecast.columns) # print(df_forecast['fDate'].tail(7).values) print(df_forecast['area_Forecast'].tail(7).values) print(lEngine.mSignalDecomposition.mTrPerfDetails.head()); lEngine.mSignalDecomposition.mBestModel.mTimeInfo.mResolution lEngine.standardPlots("forecastarea");
df = pd.read_csv(csvAngleFile2, sep=r',', engine='python', skiprows=0); df.columns = ['fDateO','angle'] df['fDate'] = range(df.shape[0]); print(df.head()); lDateVar = 'fDate' lSignalVar = 'angle' lEngine = autof.cForecastEngine() lEngine.train(iInputDS = df , iTime='fDate', iSignal = 'angle', iHorizon = fcstWin); lEngine.getModelInfo() # => relative error 7% (MAPE) df_forecast = lEngine.forecast(iInputDS = df , iHorizon = fcstWin) print(df_forecast.columns) # print(df_forecast['fDate'].tail(7).values) print(df_forecast['angle_Forecast'].tail(7).values) print(lEngine.mSignalDecomposition.mTrPerfDetails.head()); lEngine.mSignalDecomposition.mBestModel.mTimeInfo.mResolution lEngine.standardPlots("forecastangle");
csvAreaFile2 = 'C:\Users\Owner\OneDrive\PROJECTS\Paradise\ShapesDetection\SofVideo\Forecast\ffe_area2.csv'
csvAngleFile2 = 'C:\Users\Owner\OneDrive\PROJECTS\Paradise\ShapesDetection\SofVideo\Forecast\ffe_angle2.csv'
fcstWin = 60 # 3 Hours
df = pd.read_csv(csvAreaFile2, sep=r',', engine='python', skiprows=0); df.columns = ['fDateO','area'] df['fDate'] = range(df.shape[0]); print(df.head()); lDateVar = 'fDate' lSignalVar = 'area' lEngine = autof.cForecastEngine() lEngine.train(iInputDS = df , iTime='fDate', iSignal = 'area', iHorizon = fcstWin); lEngine.getModelInfo() # => relative error 7% (MAPE) df_forecast = lEngine.forecast(iInputDS = df , iHorizon = fcstWin) print(df_forecast.columns) # print(df_forecast['fDate'].tail(7).values) print(df_forecast['area_Forecast'].tail(7).values) print(lEngine.mSignalDecomposition.mTrPerfDetails.head()); lEngine.mSignalDecomposition.mBestModel.mTimeInfo.mResolution lEngine.standardPlots("forecastarea");
df = pd.read_csv(csvAngleFile2, sep=r',', engine='python', skiprows=0); df.columns = ['fDateO','angle'] df['fDate'] = range(df.shape[0]); print(df.head()); lDateVar = 'fDate' lSignalVar = 'angle' lEngine = autof.cForecastEngine() lEngine.train(iInputDS = df , iTime='fDate', iSignal = 'angle', iHorizon = fcstWin); lEngine.getModelInfo() # => relative error 7% (MAPE) df_forecast = lEngine.forecast(iInputDS = df , iHorizon = fcstWin) print(df_forecast.columns) # print(df_forecast['fDate'].tail(7).values) print(df_forecast['angle_Forecast'].tail(7).values) print(lEngine.mSignalDecomposition.mTrPerfDetails.head()); lEngine.mSignalDecomposition.mBestModel.mTimeInfo.mResolution lEngine.standardPlots("forecastangle");
From: CARME Antoine notifications@github.com Sent: Sunday, January 12, 2020 10:52 AM To: antoinecarme/pyaf pyaf@noreply.github.com Cc: Paul Fierro paulfierro@kubbla.com; Mention mention@noreply.github.com Subject: Re: [antoinecarme/pyaf] Trending Data Issue and Small Anomalous Period affecting Moving Average (#118)
@pcfierrohttps://github.com/pcfierro
Thanks for using PyAF.
PyAF It is an automatic/modular process that can be used to get some mechanical form of forecast.
It is not adjustable to match something someone (even myself) can like using a kind of non-computational quality measure. It is simply not expected nor designed to do that.
I had a look at the zip file includes two datasets. I don't have access to the details of the training process (horizon etc). Can you please provide with a python script that you use to perform the training process ?
Of course, you can always perform some kind of preprocessing on the signal (remove the two lines with outliers etc) before using PyAF. The more regular the signal, the better.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/antoinecarme/pyaf/issues/118?email_source=notifications&email_token=AOIGWP5QZ32VDI4I2OHRRN3Q5M4EHA5CNFSM4KFXWYS2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEIW5FNY#issuecomment-573428407, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AOIGWP2THKUA5MM5XRO34M3Q5M4EHANCNFSM4KFXWYSQ.
Thanks a lot for the scripts. I am playing with those. Some remarks :
INFO:pyaf.std:BEST_DECOMPOSITION '_area_Lag1Trend_residue_zeroCycle_residue_NoAR' [Lag1Trend + NoCycle + NoAR]
INFO:pyaf.std:TREND_DETAIL '_area_Lag1Trend' [Lag1Trend]
INFO:pyaf.std:CYCLE_DETAIL '_area_Lag1Trend_residue_zeroCycle' [NoCycle]
INFO:pyaf.std:AUTOREG_DETAIL '_area_Lag1Trend_residue_zeroCycle_residue_NoAR' [NoAR]
INFO:pyaf.std:MODEL_MAPE MAPE_Fit=0.0107 MAPE_Forecast=0.0057 MAPE_Test=0.0033
INFO:pyaf.std:MODEL_SMAPE SMAPE_Fit=0.012 SMAPE_Forecast=0.0057 SMAPE_Test=0.0031
INFO:pyaf.std:MODEL_MASE MASE_Fit=0.9986 MASE_Forecast=0.9946 MASE_Test=0.9833
INFO:pyaf.std:MODEL_L1 L1_Fit=129.77122995594152 L1_Forecast=228.7123323566562 L1_Test=151.35940099621064
INFO:pyaf.std:MODEL_L2 L2_Fit=1026.0115073384095 L2_Forecast=723.6191137588673 L2_Test=721.5447573136909
2. The data are not really a time series. The is a lot of consecutive identical lines like this for the same timestamp. Probably needs some cleanup (remove duplicates).
09:27:00,62.293643951416016 09:27:00,62.293643951416016 09:27:00,62.293643951416016 09:27:00,62.293643951416016 09:27:00,62.293643951416016
3. An outlier happens at 07:46:00. can be removed.
07:45:00,72.05471801757812 07:46:00,172.352783203125 07:46:00,172.352783203125 07:46:00,172.352783203125 07:46:00,172.352783203125 07:46:00,172.352783203125 07:47:00,50.6080436706543
4. Some feedback on data ? where do these data come from ?
Can you please cleanup the data and set the horizon to a reasonable value (minutes, not hours) and give me your feedback (new data and scripts welcome ;) ?
Thanks that is better, with those suggestions, seems now the pandas dataframe is not treating the ascii text as numeric time. I want tio forecast at the minute level. I aggregated the data to averages over the minutes approximately 12 seconds apart to simplify and that seemed to do better. Exponential smoothing and/or anomaly removal should help, but the area of the fire ellipse is trying to a balance moving average with a very important TREND. I have other methods but this has been a great exercise, any other ideas are welcome. I think I saw your example of converting date strings to date properly Im assuming I can do something similar with datetime function to convert properly to time.
New zip files may still not be time in my code yet, but aggregated. ffe_area2.zip
If this can help, I used something like this to remove duplicates and outliers in python code without modifying the csv file
df = pd.read_csv(csvAreaFile2, sep=r',', engine='python', skiprows=0);
# remove duplicates
df = df.drop_duplicates()
# remove outliers
df = df[df['fDate'] != '07:46:00']
df.columns = ['fDateO','area']
You really need more data (days). PyAF is a machine learning procedure, the model is estimated on a part of the dataset (first 2h) and validated on the remaining most recent part. You cannot expect a reliable (confidence interval) with this.
Closing issue after no response for 30 days. Not blocking. Please repoen if needed.
I am forecasting fire ellipses, using some angles and area. I find your system pyaf very interesting and well done in that it is compartmentalized and very module. Im new to python but not math, statistics and forecasting.
It seems to me that in the angles case the prediction intervals are following no Trend, yet due to one anomalous period in the very beginning, caused larger than needed confidence intervals.
Secondly the area does follow a trend but only the upper confidence interval is close to the expected trend line.
I have used AR, ARIMA, Holt, and Holt-Winters from scikit-learn with some better results. Is there a way to filter anomalous data, or exponential smoothing with weights that emphasise the more rect data. It seems your approach may require some adjusting or tuning or my part to get the forecast I am looking for.
Thanks
While issue reports are always welcome, and you are free to use any form to submit these, the following points are to be considered for an easier processing and more productivity: