Closed Sandy4321 closed 3 years ago
@Sandy4321
Thanks for using PyAF.
This is python script that was written some time ago for a model with categorical exogenous data.
https://github.com/antoinecarme/pyaf/blob/master/tests/exog/test_ozone_exogenous_with_categorical.py
PyAF has a set of tests for almost any feature in this directory
https://github.com/antoinecarme/pyaf/blob/master/tests/
Hope this helps.
thanks for soon answer it is not clear from here https://github.com/antoinecarme/pyaf/blob/6b652e9d522442c1fb9052dd73b442a1a18e31c7/pyaf/Bench/TS_datasets.py#L151
do you use one hot encoding for categorical data ? if not then what ?
PyAF does categorical encoding in a way that is transparent to the end user. The user only has to provide a pandas exogenous dataframe with categorical columns. These are then detected and encoded internally.
This is a direct link to the exact code that is used to perform this dummification :
great thanks then what is the best example file from https://github.com/antoinecarme/pyaf/tree/6b652e9d522442c1fb9052dd73b442a1a18e31c7/notebooks_sandbox to run to see how it works on practice
please stop using the sandbox directory. It is for experimental/developer stuff (mentioned when possible ;).
Rather use docs directory.
This notebook is a good start point :
I see you do use one hot encoding it is not full one hot, since some columns have only 0s?
Not sure of the semantics, but when a variable has all its columns with zero, that means that is has a missing or unknown/not-declared category value. Pandas allows customizing/ignoring some catgeories/values when needed.
Also, for significance reasons, some post-processing is performed. Not-very-frequent categories are ignored (count < 5).
great code
"Exogenous variables are dummified for the non-numeric types,"
can you clarify for example some reference to code for categorical time series data?