AI4Finance-Foundation / FinRL-Meta

FinRL­-Meta: Dynamic datasets and market environments for FinRL.
https://ai4finance.org
MIT License
1.21k stars 567 forks source link

[Suggestion] Normalization. #83

Open cryptocoinserver opened 2 years ago

cryptocoinserver commented 2 years ago

Adding normalization to the data preprocessor might be a great feature:

These are more advanced / adaptive approaches:

cryptocoinserver commented 2 years ago

Just a note: There is a danger of lookahead/data leaking when implementing normalization using the whole dataset. Therefore it needs to be carefully done inside the environment at each step (with a certain lookback). I saw some environments use normalization already: https://github.com/AI4Finance-Foundation/FinRL-Meta/blob/203bb7d3f890220bb3e82bc5e34b65051a0b61dc/finrl_meta/env_crypto_trading/env_multiple_crypto.py#L94

According to the paper from ResearchGate the Tanh estimator is most promising.

cryptocoinserver commented 2 years ago

Good explanation regarding the lookahead problem suggesting an expanding or rolling window: https://stats.stackexchange.com/questions/442739/look-ahead-bias-induced-by-standardization-of-a-time-series/462976#462976

zhumingpassional commented 2 years ago

Just a note: There is a danger of lookahead/data leaking when implementing normalization using the whole dataset. Therefore it needs to be carefully done inside the environment at each step (with a certain lookback). I saw some environments use normalization already:

https://github.com/AI4Finance-Foundation/FinRL-Meta/blob/203bb7d3f890220bb3e82bc5e34b65051a0b61dc/finrl_meta/env_crypto_trading/env_multiple_crypto.py#L94

According to the paper from ResearchGate the Tanh estimator is most promising.

Yes. It will greatly influence the normalization. What about add a column 'if_leaking' to denote the data. In normalization process, we will ignore the rows 'if_leaking' == true. Do you have any idea to solve it?