antoinecarme / pyaf

PyAF is an Open Source Python library for Automatic Time Series Forecasting built on top of popular pydata modules.
BSD 3-Clause "New" or "Revised" License
459 stars 72 forks source link

some reference to code for categorical time series data? #151

Closed Sandy4321 closed 3 years ago

Sandy4321 commented 3 years ago

great code

"Exogenous variables are dummified for the non-numeric types,"

can you clarify for example some reference to code for categorical time series data?

antoinecarme commented 3 years ago

@Sandy4321

Thanks for using PyAF.

This is python script that was written some time ago for a model with categorical exogenous data.

https://github.com/antoinecarme/pyaf/blob/master/tests/exog/test_ozone_exogenous_with_categorical.py

PyAF has a set of tests for almost any feature in this directory

https://github.com/antoinecarme/pyaf/blob/master/tests/

Hope this helps.

Sandy4321 commented 3 years ago

thanks for soon answer it is not clear from here https://github.com/antoinecarme/pyaf/blob/6b652e9d522442c1fb9052dd73b442a1a18e31c7/pyaf/Bench/TS_datasets.py#L151

do you use one hot encoding for categorical data ? if not then what ?

antoinecarme commented 3 years ago

PyAF does categorical encoding in a way that is transparent to the end user. The user only has to provide a pandas exogenous dataframe with categorical columns. These are then detected and encoded internally.

This is a direct link to the exact code that is used to perform this dummification :

https://github.com/antoinecarme/pyaf/blob/6b652e9d522442c1fb9052dd73b442a1a18e31c7/pyaf/TS/Exogenous.py#L93

Sandy4321 commented 3 years ago

great thanks then what is the best example file from https://github.com/antoinecarme/pyaf/tree/6b652e9d522442c1fb9052dd73b442a1a18e31c7/notebooks_sandbox to run to see how it works on practice

antoinecarme commented 3 years ago

please stop using the sandbox directory. It is for experimental/developer stuff (mentioned when possible ;).

Rather use docs directory.

This notebook is a good start point :

https://github.com/antoinecarme/pyaf/blob/6b652e9d522442c1fb9052dd73b442a1a18e31c7/docs/PyAF_Exogenous.ipynb

Sandy4321 commented 3 years ago

I see you do use one hot encoding it is not full one hot, since some columns have only 0s?

image

antoinecarme commented 3 years ago

Not sure of the semantics, but when a variable has all its columns with zero, that means that is has a missing or unknown/not-declared category value. Pandas allows customizing/ignoring some catgeories/values when needed.

antoinecarme commented 3 years ago

Also, for significance reasons, some post-processing is performed. Not-very-frequent categories are ignored (count < 5).