Decentral market primitives: market order and execution engine fairness

synctext commented 6 years ago

Expands upon the market #2559. basic background knowledge on market orders Thesis goals has been to:

1) analyze order book data and build a model which "guarantees" optimal order execution and

2) subsequently provide this functionality to the tribler market in form of an execution engine such that users will be able to get a fair price for a product.

Primary thesis adviser: machine learning expert Marco Loog.

mjuchli commented 6 years ago

For a more extensive documentation, consult the Wiki

Financial institutions make decisions on whether to buy or sell assets based on various reasons, including: customer requests, fundamental analysis, technical analysis, top-down investing, bottom-up investing and many more. The high-level trading strategies oftentimes define the purpose of their business and how the institution positions itself in the various financial markets and, if existent, towards its customers. Regardless of the high-level trading strategy that is being applied, the invariable outcome is the decision to buy or sell assets. Hence, an execution strategy aims to execute (buy or sell) orders of the demanded asset to a favourable price.

CTC-Executioner is a tool that provides an on-demand execution strategy for limit orders on crypto currency markets using Reinforcement Learning techniques. The underlying framework provides order book and match engine functionalities which allow to analyse order book data and derive features thereof. Those findings can then be used in order to dynamically update the decision making process of the execution strategy. Therefore, backtesting functionality is provided which allows to determine the performance of a model for a given historical data set. The project further conducts several experiments which analyse the behaviour of order matching in a controlled environment by utilising historical order book data with the aim to identify the limitations to overcome; as well as to provide insight of how to possibly exploit market situations for a more favourable execution to follow.

Progress updates will follow below.

synctext commented 6 years ago

progress: wrote own Q-Learner and deep-learning buzzword compliant version. Got the leading engine also operational now, OpenAI general comment on reinforcement learning. You need an explicit reward function, this is responsible for the state explosion. No need to make assumptions or build a model. The learning problem can be left as a black box. it's end-to-end. Current thesis work: ignore the distributed orderbook problem. Our joint Q-Learner work from 7 years ago: "Enhancement of BARTERCAST Using Reinforcement Learning to Effectively Manage Freeriders". Could be re-visited for phd thesis.

synctext commented 6 years ago

Progress update:

mostly documented thesis.md in 16 files.
need for a explicit reward function in the future. Bellman equation, take into account the balance between risk (CloudOmate agent survival) and future rewards (market making of Tribler coins)
synthetic dataset to determine if a certain machine learning algorithm has the ability to learn certain orderbook behavior
train within sandboxes of central markets, however none support limit-orders
- test with higher level market conditions
- flood of cancel orders
- fake liquidity
- market crash (19 October 1987 or more recent)
- flash crash emulation
thesis material is still a collection of text blocks, not a coherent sotryline for thesis.tex
- read "M.Loog" format
- write in this style from now on.
offer: do end-to-end reinforcement learning of self-replicating agents making money with TorRentCoin trading for out-of-season "Blockchain Engineering" course. 6-page IEEE publishable article.

synctext commented 6 years ago

ctc_executioner (3).pdf Points for improvement:

tutorial section
related work only explains the 3 key papers, no higher-level scientific discussion, comparison, direction of field, or brilliant insight.
"Data curation", "Market data curation", market data analysis, machine learning features...
page 20 'emptyness'
graph styles are not consistent. 1 style with readability? (Fig 4.,4 == 4 numbers)
Chapter: experimental setup or "design of the NewFancyName order placement algorithm" (formal model, algorithm, entire RL environment)
Chapter: experiments and performance analysis (/discussion)
"subsection "6.2. An empirical investigation of the reinforcement learning environment", radical naming style change
Name experiments and give them a single purpose in your storylinel. Like "Experimental establishment of the first performance bounds".
"Order placement behavior on data set II", Upwards and downloads trend experiment
missing the opening line with a clear "this experiment establishes.." "we now measure the performance bound"..
Bit boring 6.4 - 6.6 presentation. You are adding more IQ to your algorithm and finally end up with the UberPerformantMatching. Oracle comparison, how far form optimal matching in the end.
in very simple terms, how good is the market efficiency with(out) the information of market. Determine which market information is most important? discuss limitations.
thesis: order placement not a standard feature of markets, service offered by the market players. Now in open source code + AI magic..

mjuchli commented 6 years ago

Latest draft: ctc_executioner (5).pdf

Reworked chapters 1-5.
Currently finishing chapter 6.

synctext commented 6 years ago

Hypothesis: sellers incentivize buyers by placing higher prices on limit orders during a price fall. 4.3.2. Importance of order volume confusing spacing, a big white seperator before a Hypothesis indicates that a new topic starts.
Figure 4.5: Volume map of cancelled bid and ask orders. with grayed out areas of boring non-relevant information?
4.2-4.6 Figures do not have an X and Y axis unit.
4.8 low readability
5.2 I'm the agent for which you can ignore the prior chapter :-)
5.3 The second agent presented in this section is known as Deep Q-Network. This agent differs from the.. because it's smarter..more external data..something.. Now chapter 4 comes into your memory again, thank you for paying attention ;-)

your thesis style is unlike other students of Distributed Systems for past decade.

very detailed experiment, trading, and reinforcement learning wording mixed with scientific observations and conclusions
Nature.com definition: Supplementary Information (SI) is peer-reviewed material directly relevant to the conclusion of a paper that cannot be included in the printed version for reasons of space or medium (for example, movie clips or sound files).
with lack of detail a scientific experiment becomes unreproducable.
Python Notebooks nail it all down with single bit-accuracy
Deep Science Stuff: A randomly chosen order book state defines the beginning of the time horizon and the set of order book states that fall into this window. This is very crucial since the states within this time horizon and set of states, not only lead to the observation states received by the agent, but also will determine the outcome of the matching process.

Are we discussion a single numerical example, sample dataset, or devising a grand unification orderbook theory? Lots of random details, completely dependent on you dataset and other thesis student decisions:

With 30 seconds left to buy 1.0 BTC, in Figure 6.4c, the orders placed above the spread become stable
for any such limit level, much more so than with the same time horizon in the previous investigation with
the data set I. This is likely due to the higher order pressure of the data set II, as described in Section 6.2, as
there are more market participants willing to sell. The return curve that indicates sell orders placed by an
agent, shown in Figure 6.4d, becomes more evenly distributed. Therefore, limit orders tend to become more
rewarding and an agent might benefit from a slight increase in price within the given time horizon.
This pattern becomes clearly appararent when a time horizon of 60 and 100 seconds was given, as shown
in Figures 6.4f and 6.4h respectively. With the increased time horizon, the assumptions stated in the beginning
of this section are confirmed and the agent, when trying to sell shares, should indeed place orders deep
in the order book. As time passes and the market price rises, market participants are willing to buy for an
increasing price and an agent is expected to be able to sell all assets for such an increased price without the
need of a following market order. Contrarily, if the agent decides to offer to sell the assets for a decreasing
price, as indicated by the higher limit levels above the spread, the less reward would can be expected. More
precisely, for a time horizon of 100 seconds, the agent is expected to receive up to $7.00 less when choosing
to cross the spread with a limit level of +100, compared to some negative limit level. Figures 6.4e and 6.4g
which show the expected results of an agent that buys assets within the 60 and 100 seconds respectively. As
is evident, during this uprising market, the expected damage can be minimized by crossing the spread and
buying immediately. The advice stated before remains: the agent should choose a price a few steps ($0.10)
above the market price as there is enough liquidity in the market to buy the demanded number of assets.

Economics?: Buy orders, as shown in Figures 6.3e and 6.3g, achieve most return when placed very deep in the order book.

Detailed empirical investigation: different chapter or appendix, to avoid mixing it with the science. (Over-The-Top-Details)
We now evaluate if market features improve the limit order placement strategy, as discussed in the prior section. (opening line of 6.5)
6.7 to another appendix?
Only 1 hour of market data: discuss that this already really pushes the limits of your hardware and has state-space explosion
6.1 methodology of the evaluation
DQN agent limitations: Not limitation, you quantify behavior! establish the closeness to the theoretical bound. We expose our agents to situations with various levels of complexity and maximum possible yield. This enables us to quantify the level of sophistication and insight the trained agents have in artificial created situations.
As a result, the entire inventory of 1.0 BTC had to be bought by using a market order at the end of the time horizon, and the trades are marked with a cross. This generated a negative reward of -8.08. The same market situation is demonstrated during which the agent initiated the process of placing a sell order You talk like a trader, not a scientist.
Conclusions: how does your thesis improve The Human Race? (e.g. not keep it down to the too detailed stuff)

synctext commented 6 years ago

"PhD level" expansion.. Re-use your Q-learner with deep reinforcement learning to cooperate, while under continuous attack from freeriders. iterated PD within group context, pair-wise encounters. Reward when Alice and Bob cooperate, penalty if Charlie defects on you. Trustchain to view historical behavior.

Related work: "Learning to Protect Communications with Adversarial Neural Cryptography". by Google Brain. We ask whether neural networks can learn to use secret keys to protect information from other neural networks. Specifically, we focus on ensuring confidentiality properties in a multiagent system, and we specify those properties in terms of an adversary. Thus, a system may consist of neural networks named Alice and Bob, and we aim to limit what a third neural network named Eve learns from eavesdropping on the communication between Alice and Bob. We do not prescribe specific cryptographic algorithms to these neural networks; instead, we train end-to-end, adversarially. We demonstrate that the neural networks can learn how to perform forms of encryption and decryption, and also how to apply these operations selectively in order to meet confidentiality goals.

mjuchli commented 6 years ago

The report attached is almost complete, except:

abstract is missing
grammar is to be polished in chapter 6/7
details of DQN implementation is to be added in chapter 5
some format issues

ctc_executioner.pdf

synctext commented 6 years ago

Final comment round:

table with state-of-the-art and positioning yourself
you are the first paper to use the full raw bid-ask feed
promote this to the Conclusion

synctext commented 6 years ago

Job well done! thesis page in official TUDelft repo

Direct link to .PDF file

Documentation includes: Jupyter Notebooks and wiki

devos50 commented 4 years ago

I think this issue can be closed. Given the variety of attacks on blockchain ordering, fairness in decentralized markets is still an open issue and I think a very promising direction for a follow-up thesis/paper.

Tribler / tribler

Decentral market primitives: market order and execution engine fairness #3486