Saving, restoring and using pre-trained models

Kismuz commented 5 years ago

Save, restore or resume trained models:

Launcher class got new logic regarding model parameters handling: Now one can easily load pre-trained model for via cluster_config --> initial_ckpt_dir arg

Launcher starting routine:

if initial_ckpt_dir is given - try to load pre-trained model and start at step=0 if succeeded;
if failed - look for routinely saved checkpoint and if succeeded - resume training at step found in that point;
if that fails - start training from scratch.

Added helper launcher.export_checkpoint() method saves most recent trained model parameters to user-defined external directory;

Notes:

when loading pre-trained model, training is started at global_step=0 unlike restoring from current checkpoint, when training resumes from last saved global_step value;
answering Yes to Launcher's Override[y/n]? affects log_dir content only;
launcher now got 'save_secs' arg, defining how often checkpoints should be written. Default value is 600 sec;
exporting checkpoint overrides content of destination folder.

See added args in: https://github.com/Kismuz/btgym/blob/master/examples/unreal_stacked_lstm_strat_4_11.ipynb

mysl commented 5 years ago

Hi, @Kismuz . Just being curious, does this mean that you have good progress on overcoming the generalization issue and near ready for launching out of sample live testing? thanks!

Kismuz commented 5 years ago

@mysl, In brief:

good progress on overcoming the generalization issue

yes indeed it seems we've got some kind of breakthrough here with combined model-based/model-free approach and constrained problem setup; still in early research stage and not ready to publish results

near ready for launching out of sample live testing

no, I don't think so. Backtest results are still unstable and a lot of work should be done before one can hope for robust live result

Expanded:

Model based approach:

exactly as in other real-world domains, we are facing inability to gather sufficient amount of real experience (or backtest data in our case) given low sample efficiency of current DRL algorithms; one established approach is to build decent model of the environment and train agent on generated data hoping it will do well on real environment. Big plus that amount of data is infinite making generalisation natural; big drawback is model inaccuracy (aka model bias) resulting lower asymptotic performance than model-free approach. There are methods to remove this bias (meta learning is one popular choice). Tons of papers on it recently, (see some below):

""" ...The now standard approach to model-based reinforcement learning is to use nite set of empirically collected data Dn to approximate the true transition probability distribution P by model ^ P and the expected reward function r by ^r. The learned model is then used to generate new samples. A standard RL algorithm can use these samples to find a close to optimal policy.... """ [from somewhere]

In our case building such model means building model generating backtest data (assuming backtrader simulator itself is near-optimal model of real broker execution engine, at least at this point). My approach is to build class of probabilistic generative model, fit it to available real train data and use it to get cheep training trajectories which are statistically identical to real data. Here I actually mean statistically identical given problem objective which naturally leads us to...

Constrained [down to stat.arb.] problem setup:

to be able to get tractable data model it is essential to impose some constraints to 'trading problem' itself. One sensible approach is to limit oneself to mean-reverting trading paradigm. For example, we aim to find optimal policy among class of mean-reverting policies given pair of [potentially] co-integrated assets by posing constraints to agent actions: we only allow opening/closing on opposite directions. Such constraint makes modelling data problem tractable: one can fit models like Ornstein-Uhlenbeck, CIR, etc. which have some nice properties (markovian, stationary etc.).

I have some implementations here: https://github.com/Kismuz/btgym/blob/master/btgym/research/model_based/model.py

Fighting model bias:

active area of research (links below), not even have touched that yet; most of my research and coding work for last two month is more econometric than RL related;

Related papers:

Pieter Abbeel et al.,"Using Inaccurate Models in Reinforcement Learning," in Proceedings of the 23rd international conference on Machine learning, 2006
Ignasi Clavera et al., "Model-Based Reinforcement Learning via Meta- Policy Optimization," arXiv preprint arXiv:1809.05214, 2018
Balazs Csanad Csaji et al., "Value Function Based Reinforcement Learning in Changing Markovian Environments," in Journal of Machine Learning Research 9, 2008
Amir-Massoud Farahmand et al., "Value-Aware Loss Function for Model- based Reinforcement Learning," Proceedings of the 20th International Con- ference on Articial Intelligence and Satistics, 2017
Yuping Luo et al., "Algorithmic Framework for Model-based Deep Reinforcement Learning with Theoretical Guarantees," arXiv preprint arXiv:1807.03858, 2018
Anusha Nagabandi et al., "Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning," arXiv preprint arXiv:1708.02596, 2017
Iulian Vlad Serban et al., "The Bottleneck Simulator: A Model-based Deep Reinforcement Learning Approach," arXiv preprint arXiv:1807.04723, 2018
Harris, D. , "Principal components analysis of cointegrated time series," in Econometric Theory, v.13, 1997
Tim Leung, Xin Li, "Optimal Mean Reversion Trading: Mathematical Analysis and Practical Applications", 2016
Alexandre d’Aspremont, "Identifying Small Mean Reverting Portfolios" , 2007
Marco Cuturi, Alexandre d'Aspremont: "Mean-Reverting Portfolios: Tradeoffs Between Sparsity and Volatility", arXiv preprint arXiv:1509.05954, 2015

mysl commented 5 years ago

@Kismuz , brilliant! Sounds similar to the dyna approach to combine model-based/model-free algorithms. Thank you so much for the detailed explanation! And a lot of interesting papers to read as well :-)

JaCoderX commented 5 years ago

@Kismuz I think the new update got a small problem. Before the update the 'train' dir had inside it the *.pbtxt file that tensorflow use to show the computational graph.
now I have the 'worker_x' directory but not the 'train' equivalent.

Kismuz commented 5 years ago

@JacobHanouna, thanks for spotting; fixed graph vis.

Kismuz / btgym