Closed Kismuz closed 4 years ago
@Kismuz this is really outstanding work, Leonardo Da Vinci level stuff :)
A quick question that pop to mind after going over the notebook. If I understand correctly you are trying to tackle the problem, of the environment being non-stationary. making any single policy we are learning suboptimal (at best, if env variation is slow). so the assumption here is if we find 2 assets that are from the same non-stationary env, the relation between them are closer to stationary?
in the example you took BTC/USD and ETH/USD. then you found the relation between them as Price ratio and Spread. My question here is, what would have happened if I took ETH/BTC directly and use standard deviation as a measure of spread?
Analityc data model related sources and literature
Short and concise introduction to OU processes applications in statistical arbitrage and, especially, view of cointegration as model-free concept is found in [1], §3, 4. Connections of orthogonal time-series decomposition and cointegration are also drawn in [6]. Simple and illuminating view on cointegration coefficient as relation of volatilities attributed to common stochastic trend can be found in [3], §3.6, Prop.2. Construction of a spread with desired properties is discussed in [2], [4], [5]. General analytical approach to optimal mean-reverting trading problem derived in [7]. Two-factor model approach is inspired by [8]. Reviews of non-gaussian OU processes and its applications are [9], [10] and [11, in Russian]. Recursive methods for exponentially smoothed mean, variance and covariance estimation are derived from [12].
[1] Attilio Meucci, “Review of Statistical Arbitrage, Cointegration, and Multivariate Ornstein-Uhlenbeck”, 2010
[2] Richard Diamond, “Learning and Trusting Cointegration in Statistical Arbitrage”, 2013
[3] Dimitrios D. Thomakos, “Optimal Linear Filtering, Smoothing and Trend Extraction for Processes with Unit Roots and Cointegration”, 2008
[4] Marco Cuturi, Alexandre d'Aspremont, “Mean-Reverting Portfolios: Tradeos Between Sparsity and Volatility”, 2015
[5] Marco Cuturi, Alexandre d'Aspremont, “Mean Reversion with a Variance Threshold”
[6] David Harris, “Principal Component Analysis of Cointegrated Time Series”, 1997
[7] Tim Leung, Xin Li, “Optimal Mean Reversion Trading: Mathematical Analysis and Practical Applications”, 2016
[8] Eduardo Schwartz, James E. Smith, “Short-Term Variations and Long-Term Dynamics in Commodity Prices”, 2000
[9] Ole E. Barndorff-Nielsen, Neil Shephard, “Non-Gaussian OU based models and some of their uses in financial economics”, 2001
[10] N. Cufaro Petroni et al., “Stationary distributions of non Gaussian Ornstein-Uhlenbeck processes for beam halos”, 2007
[11] A. V. Chertok et al., “Regime Switching Detection For The Levy Driven Ornstein–uhlenbeck Process Using Cusum Methods”, 2016
[12] Tony Finch, “Incremental calculation of weighted mean and variance”, 2009
First of scheduled demo notebooks added to new folder model-based stat-arb examples
1. An introduction to analytic pair price model
Upcoming:
2. Basic training and measuring model bias 3. Compensating model bias with meta-learning approach
Problem setup The task is constrained to well known setup of pair statistical arbitrage trading. An agent is operating on two asset prices which are supposed to be cointegrated (i.e. it exists a linear combination of those time series that is stationary and exhibit mean-reverting properties). Such combination is further referred to as “spread”. Agent only allowed to open a balanced “spread position”: short one asset and long other and vice versa. “Balanced” here means that relative amount of exposure opened on each asset is determined by cointegration relation. In econometrics 'cointegration' is well established and widely used concept providing strong theoretical background for 'mean-reverting' trading paradigm. Our key point is that this task can be cast as [at least locally] stationary markov decision process, justifying application of RL framework.
Approach Current state-of-the-art deep reinforcement learning algorithms are generally characterized by low sampling efficiency which makes it either prohibitively expensive or simply impossible to train on experience collected from genuine environment. Proposed way to satisfy these objectives is to combine model-based control and model-free reinforcement learning methods. The idea is to use finite set of empirically collected data to approximate true environment dynamics by some “world” model. The learned model is then used to generate new experience samples. A standard model-free RL algorithm can use these samples to find a close to optimal behavioral policy and exploit it in original environment. Approach is appealing because it provides infinite simulated experience resulting in cheap training and reduced overfitting. Principal drawback is that policy learnt shows suboptimal performance due to intrinsic model inaccuracy referred to as “model bias”. Fighting “model bias” induced gap in performance is one modern key challenge for reinforcement learning community. Two complementary approaches exist: either improve model to minimise bias or correct behavioral policy to compensate existing bias.
This thread is open for general discussion. For bugs and errors found, specific requests etc. please open an issue.