Kismuz / btgym

Scalable, event-driven, deep-learning-friendly backtesting library
https://kismuz.github.io/btgym/
GNU Lesser General Public License v3.0
984 stars 260 forks source link

Saving, restoring and using pre-trained models #88

Closed Kismuz closed 5 years ago

Kismuz commented 5 years ago

Save, restore or resume trained models:

Launcher class got new logic regarding model parameters handling: Now one can easily load pre-trained model for via cluster_config --> initial_ckpt_dir arg

Launcher starting routine:

  1. if initial_ckpt_dir is given - try to load pre-trained model and start at step=0 if succeeded;
  2. if failed - look for routinely saved checkpoint and if succeeded - resume training at step found in that point;
  3. if that fails - start training from scratch.

Added helper launcher.export_checkpoint() method saves most recent trained model parameters to user-defined external directory;

Notes:

  1. when loading pre-trained model, training is started at global_step=0 unlike restoring from current checkpoint, when training resumes from last saved global_step value;
  2. answering Yes to Launcher's Override[y/n]? affects log_dir content only;
  3. launcher now got 'save_secs' arg, defining how often checkpoints should be written. Default value is 600 sec;
  4. exporting checkpoint overrides content of destination folder.
See added args in: https://github.com/Kismuz/btgym/blob/master/examples/unreal_stacked_lstm_strat_4_11.ipynb
mysl commented 5 years ago

Hi, @Kismuz . Just being curious, does this mean that you have good progress on overcoming the generalization issue and near ready for launching out of sample live testing? thanks!

Kismuz commented 5 years ago

@mysl, In brief:

good progress on overcoming the generalization issue

yes indeed it seems we've got some kind of breakthrough here with combined model-based/model-free approach and constrained problem setup; still in early research stage and not ready to publish results

near ready for launching out of sample live testing

no, I don't think so. Backtest results are still unstable and a lot of work should be done before one can hope for robust live result

Expanded:

Model based approach:

exactly as in other real-world domains, we are facing inability to gather sufficient amount of real experience (or backtest data in our case) given low sample efficiency of current DRL algorithms; one established approach is to build decent model of the environment and train agent on generated data hoping it will do well on real environment. Big plus that amount of data is infinite making generalisation natural; big drawback is model inaccuracy (aka model bias) resulting lower asymptotic performance than model-free approach. There are methods to remove this bias (meta learning is one popular choice). Tons of papers on it recently, (see some below):

""" ...The now standard approach to model-based reinforcement learning is to use nite set of empirically collected data Dn to approximate the true transition probability distribution P by model ^ P and the expected reward function r by ^r. The learned model is then used to generate new samples. A standard RL algorithm can use these samples to find a close to optimal policy.... """ [from somewhere]

In our case building such model means building model generating backtest data (assuming backtrader simulator itself is near-optimal model of real broker execution engine, at least at this point). My approach is to build class of probabilistic generative model, fit it to available real train data and use it to get cheep training trajectories which are statistically identical to real data. Here I actually mean statistically identical given problem objective which naturally leads us to...

Constrained [down to stat.arb.] problem setup:

to be able to get tractable data model it is essential to impose some constraints to 'trading problem' itself. One sensible approach is to limit oneself to mean-reverting trading paradigm. For example, we aim to find optimal policy among class of mean-reverting policies given pair of [potentially] co-integrated assets by posing constraints to agent actions: we only allow opening/closing on opposite directions. Such constraint makes modelling data problem tractable: one can fit models like Ornstein-Uhlenbeck, CIR, etc. which have some nice properties (markovian, stationary etc.).

I have some implementations here: https://github.com/Kismuz/btgym/blob/master/btgym/research/model_based/model.py

Fighting model bias:

active area of research (links below), not even have touched that yet; most of my research and coding work for last two month is more econometric than RL related;

Related papers:

mysl commented 5 years ago

@Kismuz , brilliant! Sounds similar to the dyna approach to combine model-based/model-free algorithms. Thank you so much for the detailed explanation! And a lot of interesting papers to read as well :-)

JaCoderX commented 5 years ago

@Kismuz I think the new update got a small problem. Before the update the 'train' dir had inside it the *.pbtxt file that tensorflow use to show the computational graph.
now I have the 'worker_x' directory but not the 'train' equivalent.

Kismuz commented 5 years ago

@JacobHanouna, thanks for spotting; fixed graph vis.