Open alexholdenmiller opened 1 year ago
ooh, this is quite often for deep neural networks (DRL algorithms). There are random seeds for (random batch, SGD, optimizer, etc.), so each time (even the same settings) the testing results will be different. The above values are considered to be similar, which is a good indicator of reproducibility.
You are talking about the baseline?
A possible reason is we updated the codes several rounds, so slight different from the version of that paper.
True that's the case for NNs, but this is just ^DJI so it should be fully deterministic based on the trading data and the algorithm for computing e.g. annual return and sharpe ratio (I don't see reason for variability in any of that). Certainly, I get the same result every time I run the program.
Would you recommend I refer to a more recent paper that is more likely to have used the more recent code? I'm trying to repro the results, hopefully the neural net ones next.
Could you share the hyper-parameters used to train the top-performing RL model (e.g. PPO), either in that work or a more recent one?
@XiaoYangLiu-FinRL any thoughts?
DRL algorithms such as ppo is a stochastic algorithm. the hyper parameters in tutorials are listed, pls check them. pls refer to this notebook https://github.com/AI4Finance-Foundation/FinRL-Tutorials/blob/master/1-Introduction/Stock_NeurIPS2018_SB3.ipynb
Yes, I've gone through that notebook. I'm running basically the same code as that notebook on the dates mentioned in this paper: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3690996
The baseline results are very different (27% annual return in the notebook vs 8%), due to the different dates (buy and hold being much better in the notebook results). However, retraining the PPO algorithm using the parameters from the notebook (and many other parameters) consistently gives quite poor results.
Some other questions for you then...
Are these the right indicators to use? These are the defaults: INDICATORS = [ "macd", "boll_ub", "boll_lb", "rsi_30", "cci_30", "dx_30", "close_30_sma", "close_60_sma", ]
Did you notice divide by zero errors as well? https://github.com/AI4Finance-Foundation/FinRL/issues/772
Did you modify the policy (i.e. to something other than MlpPolicy) argument to the models from stable_baselines3?
indicators are right.
ppo is a stochastic strategy, and the results may be different even using the same hyperparameters.
I noticed the dividing by zero errors. I guess some raw data from download may be missing.
Did you modify the policy (i.e. to something other than MlpPolicy) argument to the models from stable_baselines3? response: no
Yes, I understand PPO is stochastic - but in hundreds of runs (including with hyperparams that are improving my results) I haven't achieved a single one as good as the one you reported, and I am directly searching over hyperparams using the test set. So, I suspect something isn't quite running correctly (e.g. maybe I changed something by mistake). I'm trying to debug this, but also wanted to make 100% sure I'm using the right arguments. Hm... thanks for the responses, I'll keep digging.
Your paper mentions that you continue to update the model weights during the test set. It appears the notebook does not implement this, because you create a training environment from the training data and train the model, then use a separate test environment for the prediction task and reporting results.
Do you have the code for implementing the "continue updating during test" scenario? This may explain the significantly worse results that I'm getting as compared to what is reported in your paper.
Describe the bug Baseline numbers don't quite match paper numbers. They're close, but not sure why they aren't exact... am I doing anything wrong?
To Reproduce Steps to reproduce the behavior:
Expected behavior Copied from paper: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3690996
Actual behavior