Closed synctext closed 4 years ago
For a more extensive documentation, consult the Wiki
Financial institutions make decisions on whether to buy or sell assets based on various reasons, including: customer requests, fundamental analysis, technical analysis, top-down investing, bottom-up investing and many more. The high-level trading strategies oftentimes define the purpose of their business and how the institution positions itself in the various financial markets and, if existent, towards its customers. Regardless of the high-level trading strategy that is being applied, the invariable outcome is the decision to buy or sell assets. Hence, an execution strategy aims to execute (buy or sell) orders of the demanded asset to a favourable price.
CTC-Executioner is a tool that provides an on-demand execution strategy for limit orders on crypto currency markets using Reinforcement Learning techniques. The underlying framework provides order book and match engine functionalities which allow to analyse order book data and derive features thereof. Those findings can then be used in order to dynamically update the decision making process of the execution strategy. Therefore, backtesting functionality is provided which allows to determine the performance of a model for a given historical data set. The project further conducts several experiments which analyse the behaviour of order matching in a controlled environment by utilising historical order book data with the aim to identify the limitations to overcome; as well as to provide insight of how to possibly exploit market situations for a more favourable execution to follow.
Progress updates will follow below.
progress: wrote own Q-Learner and deep-learning buzzword compliant version. Got the leading engine also operational now, OpenAI general comment on reinforcement learning. You need an explicit reward function, this is responsible for the state explosion. No need to make assumptions or build a model. The learning problem can be left as a black box. it's end-to-end. Current thesis work: ignore the distributed orderbook problem. Our joint Q-Learner work from 7 years ago: "Enhancement of BARTERCAST Using Reinforcement Learning to Effectively Manage Freeriders". Could be re-visited for phd thesis.
Progress update:
ctc_executioner (3).pdf Points for improvement:
Latest draft: ctc_executioner (5).pdf
With 30 seconds left to buy 1.0 BTC, in Figure 6.4c, the orders placed above the spread become stable
for any such limit level, much more so than with the same time horizon in the previous investigation with
the data set I. This is likely due to the higher order pressure of the data set II, as described in Section 6.2, as
there are more market participants willing to sell. The return curve that indicates sell orders placed by an
agent, shown in Figure 6.4d, becomes more evenly distributed. Therefore, limit orders tend to become more
rewarding and an agent might benefit from a slight increase in price within the given time horizon.
This pattern becomes clearly appararent when a time horizon of 60 and 100 seconds was given, as shown
in Figures 6.4f and 6.4h respectively. With the increased time horizon, the assumptions stated in the beginning
of this section are confirmed and the agent, when trying to sell shares, should indeed place orders deep
in the order book. As time passes and the market price rises, market participants are willing to buy for an
increasing price and an agent is expected to be able to sell all assets for such an increased price without the
need of a following market order. Contrarily, if the agent decides to offer to sell the assets for a decreasing
price, as indicated by the higher limit levels above the spread, the less reward would can be expected. More
precisely, for a time horizon of 100 seconds, the agent is expected to receive up to $7.00 less when choosing
to cross the spread with a limit level of +100, compared to some negative limit level. Figures 6.4e and 6.4g
which show the expected results of an agent that buys assets within the 60 and 100 seconds respectively. As
is evident, during this uprising market, the expected damage can be minimized by crossing the spread and
buying immediately. The advice stated before remains: the agent should choose a price a few steps ($0.10)
above the market price as there is enough liquidity in the market to buy the demanded number of assets.
"PhD level" expansion.. Re-use your Q-learner with deep reinforcement learning to cooperate, while under continuous attack from freeriders. iterated PD within group context, pair-wise encounters. Reward when Alice and Bob cooperate, penalty if Charlie defects on you. Trustchain to view historical behavior.
Related work: "Learning to Protect Communications with Adversarial Neural Cryptography". by Google Brain. We ask whether neural networks can learn to use secret keys to protect information from other neural networks. Specifically, we focus on ensuring confidentiality properties in a multiagent system, and we specify those properties in terms of an adversary. Thus, a system may consist of neural networks named Alice and Bob, and we aim to limit what a third neural network named Eve learns from eavesdropping on the communication between Alice and Bob. We do not prescribe specific cryptographic algorithms to these neural networks; instead, we train end-to-end, adversarially. We demonstrate that the neural networks can learn how to perform forms of encryption and decryption, and also how to apply these operations selectively in order to meet confidentiality goals.
The report attached is almost complete, except:
Final comment round:
Job well done! thesis page in official TUDelft repo
Direct link to .PDF file
Documentation includes: Jupyter Notebooks and wiki
I think this issue can be closed. Given the variety of attacks on blockchain ordering, fairness in decentralized markets is still an open issue and I think a very promising direction for a follow-up thesis/paper.
Expands upon the market #2559. basic background knowledge on market orders Thesis goals has been to:
1) analyze order book data and build a model which "guarantees" optimal order execution and
2) subsequently provide this functionality to the tribler market in form of an execution engine such that users will be able to get a fair price for a product.
Primary thesis adviser: machine learning expert Marco Loog.