AI4Finance-Foundation / FinRL

FinRL: Financial Reinforcement Learning. 🔥
https://ai4finance.org
MIT License
9.66k stars 2.34k forks source link

Baseline code doesn't match paper numbers #746

Open alexholdenmiller opened 1 year ago

alexholdenmiller commented 1 year ago

Describe the bug Baseline numbers don't quite match paper numbers. They're close, but not sure why they aren't exact... am I doing anything wrong?

To Reproduce Steps to reproduce the behavior:

from finrl.plot import backtest_stats, get_baseline
start = "2016-01-04"
end = "2020-05-08"
baseline_df = get_baseline(ticker="^DJI", start = start, end = end)
stats = backtest_stats(baseline_df, value_col_name = 'close')
print(stats)

Expected behavior Copied from paper: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3690996

Annual return: 0.078
Cumulative returns: 0.386
Annual volatility:  0.201
Sharpe ratio: 0.47
Max Drawdown: -0.371

Actual behavior

Annual return          0.079210
Cumulative returns     0.392266
Annual volatility      0.200459
Sharpe ratio           0.481570
Max drawdown          -0.370862
YangletLiu commented 1 year ago

ooh, this is quite often for deep neural networks (DRL algorithms). There are random seeds for (random batch, SGD, optimizer, etc.), so each time (even the same settings) the testing results will be different. The above values are considered to be similar, which is a good indicator of reproducibility.

You are talking about the baseline?
A possible reason is we updated the codes several rounds, so slight different from the version of that paper.

alexholdenmiller commented 1 year ago

True that's the case for NNs, but this is just ^DJI so it should be fully deterministic based on the trading data and the algorithm for computing e.g. annual return and sharpe ratio (I don't see reason for variability in any of that). Certainly, I get the same result every time I run the program.

Would you recommend I refer to a more recent paper that is more likely to have used the more recent code? I'm trying to repro the results, hopefully the neural net ones next.

alexholdenmiller commented 1 year ago

Could you share the hyper-parameters used to train the top-performing RL model (e.g. PPO), either in that work or a more recent one?

alexholdenmiller commented 1 year ago

@XiaoYangLiu-FinRL any thoughts?

zhumingpassional commented 1 year ago

DRL algorithms such as ppo is a stochastic algorithm. the hyper parameters in tutorials are listed, pls check them. pls refer to this notebook https://github.com/AI4Finance-Foundation/FinRL-Tutorials/blob/master/1-Introduction/Stock_NeurIPS2018_SB3.ipynb

alexholdenmiller commented 1 year ago

Yes, I've gone through that notebook. I'm running basically the same code as that notebook on the dates mentioned in this paper: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3690996

The baseline results are very different (27% annual return in the notebook vs 8%), due to the different dates (buy and hold being much better in the notebook results). However, retraining the PPO algorithm using the parameters from the notebook (and many other parameters) consistently gives quite poor results.

Some other questions for you then...

Are these the right indicators to use? These are the defaults: INDICATORS = [ "macd", "boll_ub", "boll_lb", "rsi_30", "cci_30", "dx_30", "close_30_sma", "close_60_sma", ]

Did you notice divide by zero errors as well? https://github.com/AI4Finance-Foundation/FinRL/issues/772

Did you modify the policy (i.e. to something other than MlpPolicy) argument to the models from stable_baselines3?

zhumingpassional commented 1 year ago

indicators are right.

ppo is a stochastic strategy, and the results may be different even using the same hyperparameters.

I noticed the dividing by zero errors. I guess some raw data from download may be missing.

Did you modify the policy (i.e. to something other than MlpPolicy) argument to the models from stable_baselines3? response: no

alexholdenmiller commented 1 year ago

Yes, I understand PPO is stochastic - but in hundreds of runs (including with hyperparams that are improving my results) I haven't achieved a single one as good as the one you reported, and I am directly searching over hyperparams using the test set. So, I suspect something isn't quite running correctly (e.g. maybe I changed something by mistake). I'm trying to debug this, but also wanted to make 100% sure I'm using the right arguments. Hm... thanks for the responses, I'll keep digging.

alexholdenmiller commented 1 year ago

Your paper mentions that you continue to update the model weights during the test set. It appears the notebook does not implement this, because you create a training environment from the training data and train the model, then use a separate test environment for the prediction task and reporting results.

Do you have the code for implementing the "continue updating during test" scenario? This may explain the significantly worse results that I'm getting as compared to what is reported in your paper.