antoinecarme / pyaf

PyAF is an Open Source Python library for Automatic Time Series Forecasting built on top of popular pydata modules.
BSD 3-Clause "New" or "Revised" License
457 stars 73 forks source link

Strange Month column in a test dataset #142

Closed antoinecarme closed 3 years ago

antoinecarme commented 3 years ago

the column 'Month' seems wrong in this log

https://github.com/antoinecarme/pyaf/blob/master/tests/references/exog_test_ozone_exogenous.log

INFO:pyaf.std:START_TRAINING 'Ozone'
      Date  Month  Exog2 Exog3 Exog4  Ozone       Time
0  1955-01   1955      1    AQ   P_R    2.7 1955-01-01
1  1955-02   1955      2    AR   P_R    2.0 1955-02-01
2  1955-03   1955      3    AS   P_S    3.6 1955-03-01
3  1955-04   1955      4    AT   P_U    5.0 1955-04-01
4  1955-05   1955      5    AU   P_V    6.5 1955-05-01
INFO:pyaf.std:END_TRAINING_TIME_IN_SECONDS 'Ozone' 18.12391185760498
INFO:pyaf.std:TIME_DETAIL TimeVariable='Time' TimeMin=1955-01-01T00:00:00.000000 TimeMax=1967-09-01T00:00:00.000000 TimeDelta=<DateOffset: months=1> Horizon=12
INFO:pyaf.std:SIGNAL_DETAIL_ORIG SignalVariable='Ozone' Length=204  Min=1.2 Max=8.7  Mean=3.8357843137254903 StdDev=1.4915592159401185
INFO:pyaf.std:SIGNAL_DETAIL_TRANSFORMED TransformedSignalVariable='_Ozone' Min=1.2 Max=8.7  Mean=3.8357843137254903 StdDev=1.4915592159401185

Need to see where this error comes from,. an original CSV file ? a computation ? analyze the overall impact on the tests.

antoinecarme commented 3 years ago

This log is obtained by executing the following test :

https://github.com/antoinecarme/pyaf/blob/master/tests/exog/test_ozone_exogenous.py

Which builds a model on a file obtained by calling :

https://github.com/antoinecarme/pyaf/blob/f745910700787aa036d4caa8e869406c4db1a0cd/pyaf/Bench/TS_datasets.py#L122

antoinecarme commented 3 years ago

load_ozone_exogenous tries to load a file :

trainfile = "https://raw.githubusercontent.com/antoinecarme/pyaf/master/data/ozone-la-exogenous.csv"

image

antoinecarme commented 3 years ago

Conclusion : The 'Month' column is wrong in the original dataset CSV file.

  1. Who created this file ? script/notebook ?
  2. Is it widely used (impact) in the tests ? Was it used as a basis to create other CSV files ?
  3. Is the 'Month' column used in the models ?
antoinecarme commented 3 years ago

Fixed the dataset and its related tests.

Closing.