Open cryptocoinserver opened 2 years ago
Just a note: There is a danger of lookahead/data leaking when implementing normalization using the whole dataset. Therefore it needs to be carefully done inside the environment at each step (with a certain lookback). I saw some environments use normalization already: https://github.com/AI4Finance-Foundation/FinRL-Meta/blob/203bb7d3f890220bb3e82bc5e34b65051a0b61dc/finrl_meta/env_crypto_trading/env_multiple_crypto.py#L94
According to the paper from ResearchGate the Tanh estimator is most promising.
Good explanation regarding the lookahead problem suggesting an expanding or rolling window: https://stats.stackexchange.com/questions/442739/look-ahead-bias-induced-by-standardization-of-a-time-series/462976#462976
Just a note: There is a danger of lookahead/data leaking when implementing normalization using the whole dataset. Therefore it needs to be carefully done inside the environment at each step (with a certain lookback). I saw some environments use normalization already:
According to the paper from ResearchGate the Tanh estimator is most promising.
Yes. It will greatly influence the normalization. What about add a column 'if_leaking' to denote the data. In normalization process, we will ignore the rows 'if_leaking' == true. Do you have any idea to solve it?
Adding normalization to the data preprocessor might be a great feature:
Min-Max Normalization,
Decimal Scaling Normalization,
Z-Score Normalization,
Median Normalization,
Sigmoid Normalization,
Tanh estimators
Bhanja, Samit & Das, Abhishek. (2018). Impact of Data Normalization on Deep Neural Network for Time Series Forecasting. ResearchGate
These are more advanced / adaptive approaches: