Closed Kismuz closed 5 years ago
Hi, @Kismuz . Just being curious, does this mean that you have good progress on overcoming the generalization issue and near ready for launching out of sample live testing? thanks!
@mysl, In brief:
good progress on overcoming the generalization issue
yes indeed it seems we've got some kind of breakthrough here with combined model-based/model-free approach and constrained problem setup; still in early research stage and not ready to publish results
near ready for launching out of sample live testing
no, I don't think so. Backtest results are still unstable and a lot of work should be done before one can hope for robust live result
Expanded:
exactly as in other real-world domains, we are facing inability to gather sufficient amount of real experience (or backtest data in our case) given low sample efficiency of current DRL algorithms; one established approach is to build decent model of the environment and train agent on generated data hoping it will do well on real environment. Big plus that amount of data is infinite making generalisation natural; big drawback is model inaccuracy (aka model bias) resulting lower asymptotic performance than model-free approach. There are methods to remove this bias (meta learning is one popular choice). Tons of papers on it recently, (see some below):
""" ...The now standard approach to model-based reinforcement learning is to use nite set of empirically collected data Dn to approximate the true transition probability distribution P by model ^ P and the expected reward function r by ^r. The learned model is then used to generate new samples. A standard RL algorithm can use these samples to find a close to optimal policy.... """ [from somewhere]
In our case building such model means building model generating backtest data (assuming backtrader simulator itself is near-optimal model of real broker execution engine, at least at this point). My approach is to build class of probabilistic generative model, fit it to available real train data and use it to get cheep training trajectories which are statistically identical to real data. Here I actually mean statistically identical given problem objective which naturally leads us to...
to be able to get tractable data model it is essential to impose some constraints to 'trading problem' itself. One sensible approach is to limit oneself to mean-reverting trading paradigm. For example, we aim to find optimal policy among class of mean-reverting policies given pair of [potentially] co-integrated assets by posing constraints to agent actions: we only allow opening/closing on opposite directions. Such constraint makes modelling data problem tractable: one can fit models like Ornstein-Uhlenbeck, CIR, etc. which have some nice properties (markovian, stationary etc.).
I have some implementations here: https://github.com/Kismuz/btgym/blob/master/btgym/research/model_based/model.py
active area of research (links below), not even have touched that yet; most of my research and coding work for last two month is more econometric than RL related;
Pieter Abbeel et al.,"Using Inaccurate Models in Reinforcement Learning," in Proceedings of the 23rd international conference on Machine learning, 2006
Ignasi Clavera et al., "Model-Based Reinforcement Learning via Meta- Policy Optimization," arXiv preprint arXiv:1809.05214, 2018
Balazs Csanad Csaji et al., "Value Function Based Reinforcement Learning in Changing Markovian Environments," in Journal of Machine Learning Research 9, 2008
Amir-Massoud Farahmand et al., "Value-Aware Loss Function for Model- based Reinforcement Learning," Proceedings of the 20th International Con- ference on Articial Intelligence and Satistics, 2017
Yuping Luo et al., "Algorithmic Framework for Model-based Deep Reinforcement Learning with Theoretical Guarantees," arXiv preprint arXiv:1807.03858, 2018
Anusha Nagabandi et al., "Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning," arXiv preprint arXiv:1708.02596, 2017
Iulian Vlad Serban et al., "The Bottleneck Simulator: A Model-based Deep Reinforcement Learning Approach," arXiv preprint arXiv:1807.04723, 2018
Harris, D. , "Principal components analysis of cointegrated time series," in Econometric Theory, v.13, 1997
Tim Leung, Xin Li, "Optimal Mean Reversion Trading: Mathematical Analysis and Practical Applications", 2016
Alexandre d’Aspremont, "Identifying Small Mean Reverting Portfolios" , 2007
Marco Cuturi, Alexandre d'Aspremont: "Mean-Reverting Portfolios: Tradeoffs Between Sparsity and Volatility", arXiv preprint arXiv:1509.05954, 2015
@Kismuz , brilliant! Sounds similar to the dyna approach to combine model-based/model-free algorithms. Thank you so much for the detailed explanation! And a lot of interesting papers to read as well :-)
@Kismuz I think the new update got a small problem.
Before the update the 'train' dir had inside it the *.pbtxt file that tensorflow use to show the computational graph.
now I have the 'worker_x' directory but not the 'train' equivalent.
@JacobHanouna, thanks for spotting; fixed graph vis.
Save, restore or resume trained models:
Launcher class got new logic regarding model parameters handling: Now one can easily load pre-trained model for via
cluster_config
-->initial_ckpt_dir
argLauncher starting routine:
Added helper
launcher.export_checkpoint()
method saves most recent trained model parameters to user-defined external directory;Notes:
Override[y/n]?
affects log_dir content only;See added args in: https://github.com/Kismuz/btgym/blob/master/examples/unreal_stacked_lstm_strat_4_11.ipynb