Reproducibility of DRLs

diffunity commented 3 years ago

I was running experiments on simple OHLCV features for DDPG algorithm, and I wanted to reproduce the results. I know DRL is not deterministic and I cannot reproduce the results exactly, but is it supposed to be fluctuating so much? The resulting sharpe ratio ranges from 0.1 to 2.7 after numerous and numerous re-runs.

I followed the tutorial exactly - only change I made was on the dates and the "tech_indicator_list" (which I changed to OHLCV). I have tried setting the seeds in all possible areas (random.seed, np.random.seed, torch.manual_seed, stable baseline seeds, gym env seeds, gym env action space seeds, gym dummy vec env seeds)

below are the two really different results I obtained from running the exact same experiment set up

==============Get Backtest Results===========
Annual return          0.008181
Cumulative returns     0.008181
Annual volatility      0.261993
Sharpe ratio           0.162571
Calmar ratio           0.027933
Stability              0.447138
Max drawdown          -0.292874
Omega ratio            1.027448
Sortino ratio          0.221158
Skew                        NaN
Kurtosis                    NaN
Tail ratio             1.023016
Daily value at risk   -0.032839
dtype: float64

==============Get Backtest Results===========
Annual return          0.322901
Cumulative returns     0.322901
Annual volatility      0.120866
Sharpe ratio           2.385978
Calmar ratio           4.932795
Stability              0.876256
Max drawdown          -0.065460
Omega ratio            1.483569
Sortino ratio          3.570518
Skew                        NaN
Kurtosis                    NaN
Tail ratio             1.142228
Daily value at risk   -0.014083
dtype: float64

Below here are how I set the seeds

# set seeds
from stable_baselines3.common.utils import set_random_seed
import torch
import os

os.environ['PYTHONHASHSEED']=str(42) 

random.seed(42)
set_random_seed(42)
np.random.seed(42)
torch.manual_seed(42)

...

e_train_gym = StockTradingEnv(df = train, **env_kwargs)
e_train_gym.seed(42)
e_train_gym.action_space.seed(42)
env_train, _ = e_train_gym.get_sb_env()
env_train.seed(42)

YangletLiu commented 3 years ago

Thanks for your report! Would be important to increase the reproducibility.

karljmurphy commented 3 years ago

I've noticed this issue too. Why are the results different even when the seed values have been set? Why close the issue without even a conversation around how any improvements can be made? It is a fantastic project but closing issues like this is not the way to improve it.

YangletLiu commented 3 years ago

Stability and reproducibility are two common issues for the RL community, which is really difficult. If you do more experiments, you may not want to get an answer here at this issue. Even if someone shares tuning skills, it may be misleading.

"Reproducibility of DRLs" is the current hot topic in research, I believe it is better to follow the most recent skills from leading groups, say DeepMind, OpenAI, etc.

karljmurphy commented 3 years ago

Ok - thanks for your answer. That is good to know. Cheers

AI4Finance-Foundation / FinRL

Reproducibility of DRLs #190