Public decentral markets with privacy for traders

synctext commented 7 years ago

Financial markets offer significant privacy to trading firms. Leakage of market positions and trade history offers a competitive advantage. So traders will only operate on decentral markets if their privacy is protected. Regulators have obviously more access.

Builds upon: #2559

synctext commented 7 years ago

ToDo: system architecture figure + initial system design that links Tunnel community, orderbook in Python, relaying, spam-prevention, etc..

ghost commented 7 years ago

Report with Problem Description and Architecture

synctext commented 7 years ago

Please document DDoS problem (100Mbps for $5/month).

Problem of DDoS with Tor-based orderbook relays. Possible idea: start with zero help, directly spread your bid/ask, do trades, build trust, others will start to help you.

Prototype: back to 2013 code; proxies in network route your traffic. No Chaum remixing or Onion crypto. Trivial to match traffic with sniffing.

devos50 commented 7 years ago

Related work: Bitsquare (https://bitsquare.io):

bitsquare

They seem to use Tor together with mainnet.

synctext commented 7 years ago

Current idea to prevent bid/ask spam is to either use a cybercurrency or TrustChain (reputation based solution). Another option is to use this in combination with network latency, as documented here #2541.

Build a fresh new community within Dispersy which builds a low-latency overlay with network neighbors. Each peer which you see within this community you do a ping/pong handshake to determine the network latency. A random walk across the network does not converge fast, you only randomly stumble upon close low-latency peers. A small bias dramatically will boost the speed at which you can find 10 close peers in a 10 million group of peers. For instance, with a 50% coin toss you introduce either a random peer or one of your closest top-10 peers. Due to the triangulation effect this boosts convergence.

Next step is to build low-latency proxies. These tunnels are now fast and restricted to only a certain region. This addresses our problem as spam now is restricted to a certain region. Final policy to prevent spam is to combine the latency with tradechain reputation. You need both low-latency and sufficient reputation to be inserted into an orderbook. Peers with a bad latency connection need to compensate for this and buildup a higher reputation before they can start trading. note: current code avoids full Tor-inspired relay complexity, just proxy.

ToDo: incremental improve current code. Get 1 hop proxy operational. Add low-latency bias.

Current fee in Bitcoin does not enable microtransactions for bid/asks. It is $4 dollar to each KByte for 97.2% of blocks:

Thus the best approach is to align all the incentives. Positive reinforcement within the ecosystem where traders with a good trade history get all the help they want. Traders without this history have an incentive to behave positively. How to solve the boostrap problem of traders with zero reputation on their traderchain? For instance, you need to help others and relay orders to buildup your reputation.

synctext commented 7 years ago

ToDo: incremental improve current code.

{thoughts} We did a latency measurement 8 years ago: http://kayapo.tribler.org/trac/wiki/LatencyEstimationReport Would be good for experimental results and incremental progress to have an operational and solid "latency community". A deployed system produces thesis results and good building block for proxy development. Measured latencies for proxies and limit orderbooks can then be used. possible planning, first finish experimental latency results, then proxies/trading/security. Or primary DDoS focus; self-reinforcing trust.

synctext commented 7 years ago

Ongoing coding work on latency community, proxies, etc :

https://github.com/basvijzendoorn/tribler/branches
single autonomous response time measurement community
Response time = network latency + Dispersy processing time + Dispersy magic black hole spend in queue waiting time

Something using existing QT code http://doc.qt.io/qt-5/qtcharts-modeldata-example.html
REST api design
test GUI
- Current neighbor responsiveness... :
- Slowest and faster responding peer seen this session...:

ghost commented 7 years ago

histogram histogram2

synctext commented 7 years ago

Professional trading needs to be low-latency, private and DDoS proof.

Seems nano second range is the state-of-the-art for trading
Much prior work in low-latency

DDoS is really important in the wild. It all happened in just 45 millisecond on GDAX market

The crash at 3:30 p.m. New York time on June 21 drove the currency down to 10 cents
from $317.81.  The cause, White said, was a single $12.5 million trade -- one of the
biggest ever-- placed by a customer as a market order, or a request to sell immediately.
That pushed ethereum to $224.48, but the pain didn’t end there. The decline triggered
sell orders from traders who’d requested to bail on the currency if prices fell to
certain levels, and prompted GDAX to liquidate some margin trades.

ToDo: incremental progress. Deploy latency community with 1 extra message: get_recent_seen_latencies() which shows the last end-to-end response times at Dispersy community level with the last 8? digits of each IPv4 number obfuscated. Only use a single UDP packet for this gossip reply. Next: Crawl latencies within a Gumby experiment.

ghost commented 7 years ago

Clear target build the lowest latency overlay. Two months: Experiments finished.

ghost commented 7 years ago

Experiment with 500 nodes crawls latencies

synctext commented 7 years ago

Nice progress! Next steps:

thesis documentation, packet layout, experiment description, Why/What, etc.
add value to market community by replacing the default Dispersy overlay with a low-latency overlay
orderbooks are in sync faster, can be argued
low-latency overlay experiments and deployment
Future steps: privacy aspect.

devos50 commented 7 years ago

@basvijzendoorn see https://cloud.githubusercontent.com/assets/3785124/22742749/33eb4aca-ee18-11e6-9898-675a23a287cf.png

ghost commented 7 years ago

@basvijzendoorn https://github.com/Tribler/dispersy/pull/526

synctext commented 7 years ago

Thesis-level Gumby experiment:

give each user a "fixed" latency
use a certain real-world dataset, like https://pdos.csail.mit.edu/archive/p2psim/kingdata/
goal is to build a low-latency overlay effectively and efficiently
use multi-dimensional latency estimation algorithm to determine next ideal node for introduction-response
compare different algorithms
first assume all nodes fully connectable, enhanced version preserve the Disparsy NAT traversal mechanism
determine (best class of) algorithms

ghost commented 7 years ago

Upon introduction request: Predicting what the latency would be for the requester.

synctext commented 7 years ago

prime example of low-latency network, Bitcoin enhancement: http://bitcoinfibre.org/stats.html

ghost commented 7 years ago

https://www.sharelatex.com/project/592c19a601647e1979114c42 Dispersy community: https://github.com/basvijzendoorn/dispersy/tree/latency_overlay_backward Delft_University_of_Technology_Thesis_and_Report(1).pdf

synctext commented 7 years ago

Current status. Created a Dispersy latency community; but now moved into Dispersy itself. This implementation runs on DAS5, can measure node-to-node ping times, gossip these results using a dispersy-request-latencies message, and builds a hard-coded lowest latency peer discovery mechanism (thus killing exploration and randomness).

Using this collected ping times various existing network distance algorithms, such as GNP. Key challenge Upon introduction request: Predicting what the latency would be for the requester. Thus we only need to calculate the latency for a single node every few seconds. Scientific challenge: the algorithms are slow. Matrix of 50 nodes x 50 nodes with X, Y coordinates and assuming symmetric connections is 3 seconds with merely 1 itteration.

Instead of re-calculating the whole world state every 5 seconds we can:

use an incremental algorithms. Algorithms runs continously, do not start from scratch upon each gossip actions.
we only need to provide a ranking of 10 potential candidates which can be introduced. We don't need absolute ping times or details. Just identify the lowest of a set.
skip halve the time. With 50% introducing random peers and 50% lowest latency peer we preserve exploration and stuck in local minima.

Golden experiments:

standard Dispersy community, measure ping times between candidates.
proposed low latency community, also measure ping times.
heuristic does not use too much CPU.
convergence time: how many steps does it take for a new joining node to find their lowest latency neighbor.
{insert science} compare what we can expect from 3-hops seed tunnel + 3-hops download tunnel as 95% certainty minimal and maximum latency (Tor-level privacy)
what does this do to anonymous trader performance?

ghost commented 7 years ago

Idea: Do real ICMP request to measure ping times without NAT puncturing.

ghost commented 7 years ago

Master Thesis link: https://www.sharelatex.com/project/592c19a601647e1979114c42 Current status: Read the background literature more carefully, understand the peer discovery mechanism better. Read and experimented with peer discovery code. Started working on writing about algorithms in literature in master thesis.

Centralized algorithms Vivaldi GNP Decentralized algorithms NPS PIC Triangle inequality, AS correction, geolocation: Htrea (2009)

Triangle inequality violation: TIV detection

Dynamic clustering Tarantula (2011) Toread (2010)

Latency measurements in P2P systems Latency in P2P Survey on application layer traffic optimization (ALTO) problem Applying GNP in P2P systems

Thought about incremental algorithm that recalculates the coordinates of a new peer plus his neighbors upon introduction. In normal conditions these are around 10 coordinates. With a fast walker around 30 coordinates are recalculated. A maximum number of coordinates for recalculation can be set. The coordinates set their new position based on the latencies of their neighbors. Thus when a new peer is introduced his measured latencies plus all the latencies measured of his neighbors should be send with the message. Peer introduction happens on: on_introduction_request on_introduction_response on_puncture

Idea on deleting "old" latencies: Delete "old" measured latencies after 10 walker steps are made. With a fast walker latencies are deleted after 30 walker steps. By this way the system becomes responsive to changing latencies in the system and the leaving of nodes out of the system.

Idea on latency measurements: Do multiple latency measurements and average them to get a better latency measurement and to prevent outliers. Latency can vary due to temporary calculations that block the system on a node. If some measured latencies appear to be outliers, they can be deleted. Use median of multiple (for instance 5) measurements.

Idea on metrics: Use ranking metric as described in the GNP literature. Also use relative error as new error function.

Project planning: First build incremental algorithm. Optimize and compare incremental algorithm to decentralized algorithm NPS with the error and ranking metric. While doing so document the project. e.a. explain background literature, peer discovery mechanism, new incremental algorithm, experiment setup.

synctext commented 7 years ago

System model:

decentral, all agents only have incomplete local knowledge
agents gossip their latency to other agents. Each latency measurement is cryptographically signed by both agents. (free to use sign-anything policy for prototype)
incoming gossip messages contain a list of neightbor identities and measured latencies
each agents is free to misreport latencies if another agent is willing to collude
network coordinates are updated in an incremental fashion with each incoming signed message.
We assume some agents have a high minimum latency due to delays in the last-mile link.
agents have churn
build lowest-latency neighbor overlay

Status: thesis has first experiments. Ready for experiments with incremental updates and runtime measurements. X-axis of number of known latency pairs, Y-axis depicts runtime in ms of network coordinate update. Possible different curves for accuracy settings.

ghost commented 6 years ago

Status: Have a working incremental model. Next steps: Experiments and tweak current model. Metrics:

Ranking and Error summation metric for accuracy
Algorithm runtime in ms
Effects on decentral market

Latency sharing gives the possibility to report false latencies, message delaying. Possible solutions give some protection but not full protection.

Writing on the report.

synctext commented 6 years ago

Dataset: Cornell-King 2500 x 2500 node Latency https://www.cs.cornell.edu/people/egs/meridian/data.php

Current thesis status: chapter focus fixed.

Next step: solid experiment, focus on the core, explore trade-off accuracy and computational time, write 1-3 pages, already polished thesis style.

First experiment: 0 to 2500^2 matrix of information. Each incremental step we measure the computational time of the algorithm, as the matrix is expanded by 1 row and column. X-axes the matrix size, Y-axis the runtime. Plus several curves for various algorithm settings from naive to dumb and fast.
Second experiment is exploring the accuracy with full information and fast heuristics.
Third experiment is full info, heuristics and partial info in decentral Tribler setting (first 2500 latencies, now limited pings to only 20 neighbors).
Fourth: overlay building, prefer introducing lowest-latency neighbor and explore X % random, add churn.
- An incoming list of signed latencies is received
- calculate which of your neighbors will have the lowest latency to this peer (without overlap of the already measured list).
- Do a coinflip (50%) : either introduce this lowest-latency neighbor or a random neighbor
- This peer now will see if it has discovered peer closer then anybody else.

ghost commented 6 years ago

Current status: Dataset: king 1740 X 1740 latency nodes https://pdos.csail.mit.edu/archive/p2psim/kingdata/ Thesis status: Described and developed computational time metric ranking and relative error accuracy metrics, Experiment graphs added. Delft_University_of_Technology_Thesis_and_Report(2).pdf

Experiment one, two and three run.

Proposed next steps: Add more settings, Experiment four, Experiments with decentralized market.

synctext commented 6 years ago

Top-20 is a bad metric, by definition 100% accurate at 20 "datapoint".
Added latency+gossip of measurements inside Dispersy core community.py class: https://github.com/basvijzendoorn/dispersy/blame/fb415eb1bd98e7b6be778b3be6b2dd4b7619979c/community.py
Create clean and ready to deploy code !
Can it run for days, not grow in memory and CPU usage? (continuous running mode; real-time mode, fresh measurements)
Compare own algorithm with prior work ? (cpu vs. accuracy)
discuss more recent datasets, like http://data.caida.org/datasets/topology/trmethod-200808/

ghost commented 6 years ago

Delft_University_of_Technology_Thesis_and_Report.pdf

Status: Experiment 3 and four done. Clean and ready to deploy code.

Proposal: Continue with writing. Experiment 3 and 4 measurements start after some time instead of from the beginning.

synctext commented 6 years ago

Split in parts. How to present?
- central full latency vector
- decentral 'tribler' experiments. incomplete information, local viewpoint. Real P2P view. The cost of decentralisation.
- large-scale decentral. Drop Tribler, just try to scale to 2500 x 2500 or bigger dataset. #2623
implemented low-latency community! Uses Gumby, directed introduce, coinflip decentral algorithm on DAS5 with real Tribler community.
Goal-drive experimental section: the goal of our experiments is to compare various network coordinate algorithms to create a low-latency overlay.
Key experiment: cost of joining as new peer to a converged low-latency overlay.
When you come online: Cost in Bytes to find your lowest-latency neighbors.
iteration, 1 per second equals 3600 latency measurements per peer per hour. Fast discovery of lowest latency neighbors. When to throw away?
keep alive to lowest latency peer ever found! (not yet used) or drop highest latency.
3 or 4 algorithms with a name. Simple, Naive, Guido97, MicrosoftPaper2012, 10-steps, ?
Readable figures on greyscale. Not multi-colored dots.
X-axis?
Science: what is the optimal amount of randomness, 50/50 or 80/20 ?
"This lack of knowledge results in sub-optimal solutions in the above example." no thriller writing style, put the relevance in first sentence. like: another example of an algorithm with only sub-optimal solution because future events are unknown is the k-server problem. News events are unpredictable and journalists are sent to jobs, without knowing the next event. This results in inefficiencies". More fitting for intro chapters probably.
"Broadcast of bid or ask match request towards other peers." how to sell the code implemented around trade privacy?

ToDo: First problem description chapter. With privacy and trading plus related work, state-of-the-art, and incremental algorithms.

ghost commented 6 years ago

Current status: Documented incremental algorithms and eclipse attack. Documented experiment 1+2. Low latency overlay resilience against eclipse attack.

Next steps:

Do all experiments:
- third and fourth experiment with all algorithms.
- cost of entering experiment.
- latency difference in privacy of market assignment measured with statistical difference.
Explanation of algorithms.

optional-subtitle.pdf

synctext commented 6 years ago

Comments:

in general; lot of pieces of text. Making progress. Please focus on creating a single storyline. start at the start
intro of problem in either Chapter 1 or 2. Illustration:
Chapter 3: "In order to solve the complexity problems of the GNP algorithm in the decentralized Tribler setting we introduce an incremental algorithm approach to stretch the computation of the solution over time." too complex for opening sentence More like: privacy and latency matter for online markets. We now focus on incremental algorithms to predict the latency to a given Internet location. This is the cardinal primitive for building a low-latency overlay, as we shall demonstrate within our experimental section.
k-server is an explanatory example, but needs it's own dedicated Figure 3.1 ?
is the chapter focus incremental latency prediction.
where discuss: state-of-the-art and GNP ? The 2004 MIT Vivaldi system:
Mention methods, but no need to include other people formulas (unless you created a superior one).
reposition Chapter 4 (basic algorithm)?
Perhaps move Tribler peer discovery after all latency matters.
(last on ToDo) Explain how you made your own latency algorithm; including the 4 variants.

ghost commented 6 years ago

optional-subtitle(3).pdf

synctext commented 6 years ago

Quick comments:

continue polish start at the start 2.
motivation: trading and gaming ?
The NPS algorithm improves the GNP algorithm by decentralizing it. perfect opening line of section.
3.5. GEOLOCATION APPROACHES, focused exclusively on the MaxMind 2007 experiment.
5.1. LOW LATENCY OVERLAY, General system model
objective function of low-latency overlay: average latency to neighbors?

devos50 commented 6 years ago

I think the title of this issue is outdated (the focus of this thesis has changed over time)?

synctext commented 6 years ago

Thesis progress:

next meeting: document progress in a comment + some pictures or thesis experiment chapter draft
mostly on Gumby, 1 page of thesis writing
bug fixing of experimental code
experimental thesis code nearly finished
status: 500 node is computational viable, with incremental
large network, get incoming distance vector of subset of network
mechanism to remove old latencies
- (ToDo: document)
- naive approach is storing and using all latency vectors received in past year
20 node experiment, ranking of coordinate accuracy. Re-uses GNP methodology / figure
https://jenkins.tribler.org/job/pers/job/latency_exp_basvijzendoorn/37/artifact/output/ranking.png

ghost commented 6 years ago

https://www.sharelatex.com/project/5a3bc4af38c4a5721edbf694

synctext commented 6 years ago

the low-latency overlay in Dispersy has a 100% bias. Lowest-latency is always picked.
Dispersy (trusted peers; then 50% re-visit, 25% reply to introduction-response, 25% react to introduced by 3rd peer)
this creates a hammering effect, where the same peer is selected; however due to fluctuations and error in lowest-peer estimation this effect is reduced.
for trading this is dramatic: peer with high-latency connectivity are never picked
goal is healthy low-latency niehbor set; not just the golden lowest latency peer.
suggest: keep track of top-10 lowest latency peers and randomly load balance.
measure any new discovered peer and possible update top-10

ghost commented 6 years ago

thesis.pdf

synctext commented 6 years ago

Thesis needs more polish
Not easy to pinpoint, lot of small paragraphs, sharper wording
like Section 3.2 title "full decentralization", mention avoiding any single-point-of-failure
like section 3.3 is about security, but does not contain the Sybil attack, explained further in text.
like section 3.4 first explain existing system (very briefly in 2.1.1), then problem description details (keep this separate).
Section 3.4.1. NAT Puncturing, not at the right place
Section 4.2. "RETRIEVING LATENCY INFORMATION WITH DISPERSY MESSAGES" scientific wording would be obtaining latency information; subsections direct latency observations, distance vector dissemination, and indirect observations (e.g. forwarding/gossip).
remove or motivate 3 second ping and uniform(0.3).
"In order to discover what are the best practices for the low latency overlay a number of algorithms are implemented inspired on the algorithms in the literature." perhaps remind the reader that we are seeking a effective and efficient algorithm.
"The first algorithm is a naive"... designed and implemented as part of this thesis work?
consistent smooth flowing storylines:
- Instead of real measured latency’s, latency’s are extracted from the King Dataset. [34]
- Each Tribler instance runs the LatencyCommunity which is responsible for the collection of latency data and the execution of the algorithms.
- The King Dataset contains a NxN matrix with latency information between two nodes.
experimental sections: EXPLORATION OF DIFFERENT ALGORITHMS IN DECENTRALIZED TRIBLER SETTING Comparing computational costs. in this section we quantify the computational cost of the following five algorithms.
500 NODES EXPERIMENT RESULTS, scientific goal and matching experiment
no 5.2 section
bootstrap cost experiment: how many peers in the overlay do you need to contact to obtain your closest low-latency peers and how many packets and Bytes does this cost? (not expressed in bytes/sec, or general upload). Quantify the speed of convergence. 490 people know their low-latency neighbors. 10 people join and get access to this accurate info. They should in only a few steps fill their place in the overlay. 9060-ish seconds experiments, ping every 3 seconds, thus 500 peers 5400 seconds / 3 pings per second = really needed to converge? When is the algorithm accurate and why 50 Seconds, 10 steps, talked to 2% of network. That should already give you some top10 lowest latency neighbors.
no Github source code pointer

ghost commented 6 years ago

Accuracy of top 10 latency peers. A new entering peer is dotted and a peer at the beginning of the experiment is solid. A new entering peer has an advantage because the quality of introductions are higher. ranking_normal2 Quality of Introductions of new entering peer. A new entering peer gains faster higher quality of introductions. quality_of_intro Quality of introductions during experiment. quality_of_intro_experiment

ghost commented 6 years ago

final_thesis.pdf

synctext commented 6 years ago

Chapter 2: what related work is covered in this chapter. Overlays, Latency estimation algorithms, and their combination.
- "optimization functions are often part of the latency estimation algorithms." please let the reader know this vital info in the 1st line of this new section.
instead of blocks of text with isolated algorithms; good storyline connects them. For instance, in a publication from 200x the Vivialdi algorithm was the ?first? to propose a decentral algorithm... PIC is the first scalable distributed approach.
Section 3.1 use word requirement in first sentence.
"A peer on the internet cannot be directly messaged because the NAT firewall blocks incoming communication." Many users now lack a direct Internet connection.. etc..
64% of the computers connected to the internet do Network Address Translation [cite the source paper]
NAT-compatible requirement; discovering others through a NAT.
"Eclipse attacks have large implications on P2P systems that use block-chain." add explicitly that the eclipse attack is a very prowerful and very generic attack. We will now provide several examples from the world of cryptocurrencies. We selected this example, as cryptocurrency overlay hacks have direct financial consequences.
code coverage results
"Figure 5.7: The figure shows the ranking accuracy metric calculated every 10 seconds in a decentralized experiment with 500 nodes". Please focus on first 100-steps and the performance when you get 1 till 100 latency vectors to work with out of 500 nodes. Not 2400 steps, where all nodes in a 500-node overlay have been visited numerous times.
Final thesis experiment: it takes 1000 seconds for 10 peers to find 64% of their top-10 low-latency overlay neighbors. 200 steps to converge somewhat in a 500 node network.
Figure 5.11: The figure shows the total upload and download for the 10 new entering peers, please covert Y-axis to MByte.
please be honest about the strength of results. For instance: The quality of the introductions are not that great. It's all not that impressive yet. Future work: better algorithms. Now it takes a loooong time.
Conclusion chapter: needs more content. Please swing by all requirements, your design, and how much of the requirements have you met; given the experimental proof. Implication for trading privacy and onion routing.

ghost commented 6 years ago

thesis.pdf

synctext commented 6 years ago

please fix: "In the default setting in the low latency overlay latency information is obtained every second with the ping-pong mechanism from every peer in the neighbourhood."

"the other 50% of node selections a peer with a low latency toward the selecting peer is chosen."

Currently implemented:

New peer discovered: every 5 seconds, introduction-req.
ping cycle 2 seconds, ``pingandpong``` exchange includes latency of 10 peer. each peer keeps a local ping history list. This linear list is pushed serially to all other peers. Keep counter of what is already sent per peer. (difficult to read from thesis text)
"In 50% of node selections the old mechanism is used and in the other 50% of node selections a peer with a low latency"
claim "ik weet het 100% zeker": problem is the latency estimation algorithm. If the dataset is synthetic (e.g. not King dataset) and latency estimation thus accurate, it should work 100%.

Proof of running code experiment:

remove the top-8 and 50% random overlay for taking a step.
- use the overlay algorithm as specified in Jan
- should converge fast, especially with good predictive dataset.
- keep track of a lowest-latency top-10 and never let them go (each peer get's 10% of the overlay workload, no hammering of 1 peer with 100% of workload).
- introduce to other person:
- given own dataset and incoming latency top-10 vectors
- calculate lowest latency peer for incoming peer
- introduce that one 100% of the time.
- always step to the currently found top-10 lowest latency peers (pick one of 10).
- pick random 500 points in a X-axes and Y-axes system (from 1 to 500; 500 x 500 points)
- each node has a simple coordinate and your latency algorithm should work 100% accurate.
- latency is equal to geo-coordinates. 500 points == 500 milliseconds.

ghost commented 6 years ago

thesis.pdf

synctext commented 6 years ago

Thnx for the thesis update! Getting a 100% working system, due to good predictive dataset? {Contacted 3rd committee member for master defense}

synctext commented 6 years ago

Completed: final master thesis report

Tribler / tribler

Public decentral markets with privacy for traders #2887