Tribler / tribler

Privacy enhanced BitTorrent client with P2P content discovery
https://www.tribler.org
GNU General Public License v3.0
4.8k stars 444 forks source link

Byzantine-resilient, asynchronous, yet decentralized federated learning #5221

Closed synctext closed 3 years ago

synctext commented 4 years ago

Key points:

devos50 commented 4 years ago

Interesting thesis direction!

We aim to achieve this goal with our own decentralized exchange, which is fundamentally different from existing DEXes. I think the OP identified the main flaws of existing DEXes (e.g., BitShares, Waves, OasisDEX...). Even though the idea of DEXes is interesting, their liquidity is too low to attract traders. Furthermore, even though they offer trust-less asset settlement, their (transaction) costs are still high (of course, this also depends on the specifications of the underlying protocols). Finally, there are many fairness issues attached to blockchain-based exchanges.

I think market fairness as central thesis component could be a viable and novel research direction. This ties directly into mechanism design and behavior of individual agents, whose goal is to optimize their own profits. Can we make a market that provides fairness to traders? Fairness is a multi-dimensional property so you might want to focus on a specific aspect of fairness.

synctext commented 4 years ago

Some overlap here, however they just want their own coin to become big https://docs.bisq.network/dao/phase-zero.html

Bisq, a peer-to-peer exchange network designed for secure, private and
censorship-resistant trading of bitcoin for national currencies and other cryptocurrencies
[snip]
Appendix A: Roles defines the roles individuals play when participating in the Bisq DAO,
such as trader, contributor and stakeholder, and defines several categories of high-trust
bonded contributor roles such as maintainer, operator, and administrator;
jverbraeken commented 4 years ago

Initial exploratory literature review: (a) digital marketplaces (https://fetch.ai/wp-content/uploads/2019/10/Fetch.AI-Economics-white-paper.pdf) (b) distributed ledgers used for financial purposes (http://www.smallake.kr/wp-content/uploads/2018/09/E_JPX_working_paper_No15.pdf / https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3352293) (c) peer-to-peer lending (https://www.oecd.org/daf/ca/The-Potential-for-Blockchain-in-Public-Equity-Markets-in-Asia.pdf)

jverbraeken commented 4 years ago

Initial thoughts after reading some literature, before discussing with Johan:

The idea is that anyone can create a market. I identified several considerations that a market creator should consider:

  1. Checks before the market can be created
    • Government regulations (stocks, airbnb, uber)
      • A partitioned scheme that keeps certain transactions within certain locations may be required to satisfy local jurisdictional requirements
  2. At creation time
    • Send potential buyers/sellers amessage when new marketplace has opened?
      • E.g. notify a bank when someone wants to borrow
    • Digital or physical product?
    • Semantic links with other markets (e.g. ubers in Delft subset of ubers in South-Holland, black ubers subset of ubers)
  3. At run-time
    • Buyer
      • Anonymous or not?
        • Anonymity of both parties is crucial for stock markets, but undesirable for airbnb markets
    • Product
      • Scheduling of the good? (airbnb)
        • Can change every now and then
      • Location of the good? (uber / airbnb)
        • Can change every now and then
      • Order book
        • Type of matching
          • Mostly relevant for stock exchanges: first-in-first-out or pro-rata?
        • Fixed price level or not?
        • Can you see the other asks/bids or not?
          • On stock markets this is desirable, on Amazon this is unnecessary
        • Additional service
          • Reviews for the product or not?
            • Does not make sense for the stock market, but very valuable for airbnb
            • How to handle fake reviews?
          • Recommended items?
    • Seller
      • Anonymous or not?
        • Anonymity of both parties is crucial for stock markets, but undesirable for airbnb markets
  4. At purchase time
    • KNY/AML/CTM? (know your customer / anti-money laundering / counter-terrorism financing)
    • (for physical goods: ) choose third-party escrow to provide protection?
    • Payment details
      • Automatic / manual approval of purchase
        • E.g. does airbnb owner want to have the renter
      • Collatoral or not? (house, car)
      • Payment obligation (paying right now, in terms, physical or monetary goods, crypto or fiat, …)
    • Conditions to enter a marketplace
      • E.g. age for adult stuff or creditworthiness for loans
      • For loans we still need creditworthiness for external party
        • In USA they have a system with bad privacy
        • In Netherlands based on financial situation and loan payment history
        • Additional details may be obligatory to deploy bailiff (incassobureau/deurwaarder)
  5. After purchase
    • Should some party by notified to keep track of the order?
      • E.g. corporate share register managed by a transfer agent for stock markets
    • Cancellation possibility after the item has already been purchased
      • "The ability for clients to correct/cancel/adjust transactions that were inadvertently charged or credited to the wrong account is a common occurrence and well managed by today’s financial institutions. Additionally, complex financial transactions often include the ability to reverse the transaction based on contractual stipulations as a desirable feature. The ability to cancel or reverse a transaction is not supported in today’s distributed ledger platform, and it is not clear today how the platform could evolve to support that"
    • Vendor obligation (sending product right now, in terms, lottery, …)
synctext commented 4 years ago

great taxonomy start! Trustchain is consensus-free, thus no electricity for mining. What are two isolated markets that can be merged into a single infrastructure? Preferably with low competition, high fees, low innovation, and incentives to use your open alternative.

synctext commented 4 years ago

Brainstorm... Inspired by this payout platform it would be possible to build market trackers(). A payout is provided if the stock market is either below or above a certain mark. Use our DAO #5143 with shared ownership of Bitcoins.

devos50 commented 4 years ago

I am currently working on a classification/overview of existing electronic market/trade mechanisms based on blockchain technology (for a possible paper). This also includes literature on prediction markets, like Augur. I have collected 130+ (mostly peer-reviewed) articles so far and categorised them. I think there is quite some overlap with topics that you brought up. If you are interested in them, I can share them with you 👍

synctext commented 4 years ago

can you post a list here? + URLs possible even...

devos50 commented 4 years ago

@synctext sure! I'm using the Papers application to organize and categorize all the literature that I read, and I'm currently in the process of reading/summarizing each paper. I think the list below is quite complete regarding academic work, but there are still a few open-source (non-peer-reviewed) implementations missing.

Atomic Swaps

Alt chains and atomic transfers

Atomic Cross-Chain Swaps.

Atomic Cross-chain Swaps - Development, Trajectory and Potential of Non-monetary Digital Token Swap Facilities.

Atomic Cross-Chain Swaps with Improved Space and Time Complexity.

Atomic Crosschain Transactions for Ethereum Private Sidechains.

Atomic Swaptions: Cryptocurrency Derivatives

Cross-chain Deals and Adversarial Commerce.

Extending Atomic Cross-Chain Swaps.

On the optionality and fairness of Atomic Swaps.

On the specification and verification of atomic swap smart contracts.

Privacy-Preserving Cross-Chain Atomic Swaps

The state of atomic swaps

(via @papersapp) http://scholar.google.comjavascript:void(0) https://dblp.org/rec/conf/podc/Herlihy18 http://aetic.theiaer.org/archive/v3/v3n1/p5.html https://dblp.org/rec/journals/corr/abs-1905-09985 https://dblp.org/rec/journals/corr/abs-1904-12079 http://arxiv.org/abs/1807.08644v2 https://dblp.org/rec/journals/pvldb/HerlihySL19 http://link.springer.com/10.1007/978-3-030-31500-9_14 http://dl.acm.org/citation.cfm?doid=3318041.3355460 http://arxiv.org/abs/1811.06099v1 http://fc20.ifca.ai/wtsc/WTSC2020/WTSC20_paper_20.pdf http://diyhpl.us/wiki/transcripts/scalingbitcoin/tokyo-2018/atomic-swaps/

Auctions

PASTRAMI: Privacy-preserving, Auditable, Scalable&Trustworthy Auctions for Multiple Items

Succinctly Verifiable Sealed-Bid Auction Smart Contract.

Verifiable Sealed-Bid Auction on the Ethereum Blockchain.

(via @papersapp) http://link.springer.com/10.1007/978-3-030-00305-0_1 http://link.springer.com/10.1007/978-3-662-58820-8_18

Custodial Exchanges

Bitcoin: Economics, Technology, and Governance

Bitcoin Transaction Malleability and MtGox

Making Bitcoin Exchanges Transparent

Provisions: Privacy-preserving Proofs of Solvency for Bitcoin Exchanges

ShapeShift

Tesseract - Real-Time Cryptocurrency Exchange Using Trusted Hardware.

The Arwen Trading Protocols (Full Version).

Why Preventing a Cryptocurrency Exchange Heist Isn’t Good Enough

(via @papersapp) http://pubs.aeaweb.org/doi/10.1257/jep.29.2.213 https://link.springer.com/chapter/10.1007/978-3-319-11212-1_18 https://link.springer.com/chapter/10.1007/978-3-319-24177-7_28 http://dl.acm.org/citation.cfm?doid=2810103.2813674 https://shapeshift.io http://dl.acm.org/doi/10.1145/3319535.3363221 https://dblp.org/rec/journals/iacr/HeilmanLG20 https://link.springer.com/chapter/10.1007/978-3-030-03251-7_27

Decentralized Exchanges

0x: An open protocol for decentralized exchange on the Ethereum blockchain

A Demonstration of Sterling - A Privacy-Preserving Data Marketplace.

A Distributed Digital Asset-Trading Platform Based on Permissioned Blockchains

Beaver - A Decentralized Anonymous Marketplace with Secure Reputation.

Bisq - The peer-to-peer Bitcoin Exchange

BitShares 2.0: General Overview

Coincer: Decentralised Trustless Platform for Exchanging Decentralised Cryptocurrencies

Decentralized blockchain-based electronic marketplaces

Decentralizing the Stock Exchange using Blockchain An Ethereum-based implementation of the Bucharest Stock Exchange.

Deconstructing Decentralized Exchanges

Enigma Catalyst : A machine-based investing platform and infrastructure for crypto-assets

Etherdelta

Fast and secure global payments with Stellar

Fragmentation of Distributed Exchanges

IDEX: A Real-Time and High-Throughput Ethereum Smart Contract Exchange

IDMoB - IoT Data Marketplace on Blockchain.

Komodo BarterDEX

Kyber Network whitepaper

Localbitcoins

Loopring: A decentralized token exchange protocol

Mind my value - a decentralized infrastructure for fair and trusted IoT data trading.

Open bazaar protocol

Polkadot: Vision for a heterogeneous multi-chain framework

Republic Protocol: A decentralized dark pool exchange providing atomic swaps for Ethereum-based assets and Bitcoin

Resource Control in P2P Cryptocurrency Networks

SmartExchange: Decentralised Trustless Cryptocurrency Exchange

Swap: A Peer-to-peer Protocol for Trading Ethereum Tokens

Waves.Exchange | Buy crypto with 0% fees

XCLAIM: Trustless, Interoperable, Cryptocurrency-Backed Assets

(via @papersapp) https://static.xcj.com/uploads/20180518/xumb4xmv3o1516701172833.pdf http://dl.acm.org/citation.cfm?doid=3229863.3275603 https://link.springer.com/chapter/10.1007/978-3-030-05764-0_6 https://dblp.org/rec/journals/iacr/SoskaKCD16 https://docs.bisq.network/exchange/whitepaper.html https://cryptorating.eu/whitepapers/BitShares/bitshares-general.pdf https://link.springer.com/chapter/10.1007/978-3-319-64701-2_53 https://dl.acm.org/doi/10.1145/3158333 https://ieeexplore.ieee.org/document/8516610/ https://assets.pubpub.org/ob89i66u/61573938834913.pdf https://etherdelta.com http://dl.acm.org/citation.cfm?doid=3341301.3359636 http://arxiv.org/abs/1910.11216v2 https://ieeexplore.ieee.org/document/8525388/ https://files.kyber.network/Kyber_Protocol_22_April_v0.1.pdf https://localbitcoins.com https://raw.githubusercontent.com/Loopring/whitepaper/master/en_whitepaper.pdf http://dl.acm.org/citation.cfm?doid=3131542.3131564 http://docs.openbazaar.com http://scholar.google.comjavascript:void(0) http://arxiv.org/abs/1810.11675v1 https://link.springer.com/chapter/10.1007/978-3-030-04849-5_32 https://swap.tech/whitepaper/ https://waves.exchange https://ieeexplore.ieee.org/document/8835387/

Blockchain-based Energy Trading

Blockchain-based electricity trading with Digitalgrid router

Consortium Blockchain for Secure Energy Trading in Industrial Internet of Things.

Crypto-trading: Blockchain-oriented energy market

Energy trading for fun and profit buy your neighbor's rooftop solar power or sell your own-it'll all be on a blockchain

Implementation of blockchain-based energy trading system

Privacy-Preserving Energy Trading Using Consortium Blockchain in Smart Grid.

Security and Privacy in Decentralized Energy Trading Through Multi-Signatures, Blockchain and Anonymous Messaging Streams.

Towards resilient networked microgrids: Blockchain-enabled peer-to-peer electricity trading mechanism

(via @papersapp) https://ieeexplore.ieee.org/abstract/document/7991065/ http://ieeexplore.ieee.org/document/8234700/ https://ieeexplore.ieee.org/abstract/document/8240547/ https://ieeexplore.ieee.org/abstract/document/8048842/ https://www.emerald.com/insight/content/doi/10.1108/APJIE-12-2017-037/full/html https://ieeexplore.ieee.org/document/8613816/ https://dblp.org/rec/journals/tdsc/AitzhanS18 https://ieeexplore.ieee.org/abstract/document/8245449/

Matchmaking

0x: An open protocol for decentralized exchange on the Ethereum blockchain

Etherdelta

Libra - Fair Order-Matching for Electronic Financial Exchanges.

Loopring: A decentralized token exchange protocol

Swap: A Peer-to-peer Protocol for Trading Ethereum Tokens

The cost of decentralization in 0x and EtherDelta

(via @papersapp) https://static.xcj.com/uploads/20180518/xumb4xmv3o1516701172833.pdf https://etherdelta.com http://dl.acm.org/citation.cfm?doid=3318041.3355468 https://raw.githubusercontent.com/Loopring/whitepaper/master/en_whitepaper.pdf https://swap.tech/whitepaper/ https://hackingdistributed.com/2017/08/13/cost-of-decent/

Prediction Markets

A permissioned blockchain-based implementation of LMSR prediction markets.

A Smart Contract Oracle for Approximating Real-World, Real Number Values

Augur: a decentralized, open-source platform for prediction markets

Decentralized Prediction Market Without Arbiters.

Gnosis whitepaper

(via @papersapp) https://linkinghub.elsevier.com/retrieve/pii/S016792361930257X https://drops.dagstuhl.de/opus/volltexte/2020/11970 http://cryptoverze.com/wp-content/uploads/2019/01/augur.pdf http://link.springer.com/10.1007/978-3-319-70278-0_13 http://scholar.google.comjavascript:void(0)

Security Aspects of Trading

Flash Boys 2.0 - Frontrunning, Transaction Reordering, and Consensus Instability in Decentralized Exchanges.

Footprints on a Blockchain: Trading and Information Leakage in Distributed Ledgers

On the impossibility of fair exchange without a trusted third party

On the optionality and fairness of Atomic Swaps.

Provisions: Privacy-preserving Proofs of Solvency for Bitcoin Exchanges

The cost of decentralization in 0x and EtherDelta

The Decentralized Financial Crisis: Attacking DeFi

Zexe - Enabling Decentralized Private Computation.

(via @papersapp) https://dblp.org/rec/journals/corr/abs-1904-05234 http://jot.pm-research.com/lookup/doi/10.3905/jot.2017.12.3.005 https://pdfs.semanticscholar.org/208b/22c7a094ada20736593afcc8c759c7d1b79c.pdf http://dl.acm.org/citation.cfm?doid=3318041.3355460 http://dl.acm.org/citation.cfm?doid=2810103.2813674 https://hackingdistributed.com/2017/08/13/cost-of-decent/ http://arxiv.org/abs/2002.08099v1 https://dblp.org/rec/journals/iacr/BoweCGMMW18

Settlement Mechanisms

A protocol for interledger payments

Atomically Trading with Roger - Gambling on the Success of a Hardfork.

Blockchain-based secure digital asset exchange scheme with QoS-aware incentive mechanism.

Blockchain-based settlement for asset trading

Blockchain router: A cross-chain communication protocol

Bootstrapping a Blockchain Based Ecosystem for Big Data Exchange.

Centrally Banked Cryptocurrencies.

Cross-asset trading within blockchain networks

Cross-Chain Payment Protocols with Success Guarantees

DeXTT - Deterministic Cross-Blockchain Token Transfers.

Escrow Protocols for Cryptocurrencies - How to Buy Physical Goods Using Bitcoin.

Fair and Decentralized Exchange of Digital Goods.

Fair Two-Party Computations via Bitcoin Deposits.

FairSwap - How To Fairly Exchange Digital Goods.

Hallex: A trust-less exchange system for digital assets

How to Use Bitcoin to Design Fair Protocols.

Hyperledger Fabric: A Distributed Operating System for Permissioned Blockchains

HyperPubSub - Blockchain Based Publish/Subscribe.

Implementing an Asset Trading System Based on Blockchain and Game Theory

Liquid Speed: On-Demand Fast Trading at Distributed Exchanges

Market design for trading with blockchain technology

On the impossibility of fair exchange without a trusted third party

Optimistic Protocols for Fair Exchange.

PEX - Privacy-Preserved, Multi-Tier Exchange Framework for Cross Platform Virtual Assets Trading.

Pisa - Arbitration Outsourcing for State Channels.

Proof of Delivery of Digital Assets Using Blockchain and Smart Contracts.

Real-time Money Routing by Trusting Strangers with your Funds.

Resource trading in blockchain-based industrial Internet of Things

SDTE - A Secure Blockchain-Based Data Trading Ecosystem.

Towards Decentralized Equilibrium Asset Trading Based on Blockchain.

Trading Real-World Assets on Blockchain - An Application of Trust-Free Transaction Systems in the Market for Lemons.

Trading Stocks on Blocks - Engineering Decentralized Markets

Trust Is Risk - A Decentralized Financial Trust Platform.

Usable optimistic fair exchange.

XChange - A Blockchain-based Mechanism for Generic Asset Trading In Resource-constrained Environments.

XCLAIM: Trustless, Interoperable, Cryptocurrency-Backed Assets

(via @papersapp) http://blockchainlab.com/pdf/interledger.pdf http://link.springer.com/10.1007/978-3-319-67816-0_19 https://ieeexplore.ieee.org/document/8808111/ https://academic.oup.com/rfs/article-abstract/32/5/1716/5427772 http://ieeexplore.ieee.org/document/8029360/ https://dblp.org/rec/conf/ndss/DanezisM16 https://patents.google.com/patent/US20190311351A1/en http://arxiv.org/abs/1912.04513v1 http://arxiv.org/abs/1905.06204v1 http://link.springer.com/10.1007/978-3-319-70972-7_18 https://dblp.org/rec/journals/corr/abs-2002-09689 http://link.springer.com/10.1007/978-3-662-44774-1_8 https://dl.acm.org/doi/10.1145/3243734.3243857 https://dblp.org/rec/journals/iacr/BentovK14 http://dl.acm.org/citation.cfm?doid=3190508.3190538 https://ieeexplore.ieee.org/document/9049532/ https://ieeexplore.ieee.org/abstract/document/8945822/ http://arxiv.org/abs/1907.10720v1 http://blockchain.cs.ucl.ac.uk/wp-content/uploads/2016/11/Paper_18.pdf https://pdfs.semanticscholar.org/208b/22c7a094ada20736593afcc8c759c7d1b79c.pdf http://portal.acm.org/citation.cfm?doid=266420.266426 https://ieeexplore.ieee.org/document/9045515/ http://dl.acm.org/citation.cfm?doid=3318041.3355461 https://ieeexplore.ieee.org/document/8501910/ https://ieeexplore.ieee.org/document/8696786/ https://ieeexplore.ieee.org/abstract/document/8657779/ https://ieeexplore.ieee.org/document/8759960/ https://ieeexplore.ieee.org/document/8855688/ http://link.springer.com/10.1007/s12599-017-0499-8 https://link.springer.com/chapter/10.1007/978-3-319-59144-5_34 https://dblp.org/rec/journals/iacr/LitosZ17 https://linkinghub.elsevier.com/retrieve/pii/S138912861100301X http://arxiv.org/abs/2004.05046v1 https://ieeexplore.ieee.org/document/8835387/

Various

Attacking the DeFi Ecosystem with Flash Loans for Fun and Profit.

Beware the Middleman - Empirical Analysis of Bitcoin-Exchange Risk.

Blockchain-enabled Intelligent Asset Exchange for a Circular Economy.

Challenges and Opportunities Associated with a Bitcoin-Based Transaction Rating System.

Dispute Resolution for Smart Contract-based Two-Party Protocols.

Hyperledger Fabric: A Distributed Operating System for Permissioned Blockchains

Measuring the Longitudinal Evolution of the Online Anonymous Marketplace Ecosystem.

Overview of Emerging Blockchain Architectures and Platforms for Electronic Trading Exchanges

Peer Review - Toward a Blockchain-enabled Market-based Ecosystem.

Shared Ledger Accounting - Implementing the Economic Exchange pattern.

The Economics of Cryptocurrency Pump and Dump Schemes

The Emerging Role of Electronic Marketplaces on the Internet.

Tokenization: Open Asset Protocol on Blockchain

Towards atomic cross-chain token transfers: State of the art and open questions within tast

(via @papersapp) http://arxiv.org/abs/2003.03810v2 http://link.springer.com/10.1007/978-3-642-39884-1_3 https://dblp.org/rec/journals/ercim/AskoxylakisAD17 http://link.springer.com/10.1007/978-3-662-44774-1_3 https://ieeexplore.ieee.org/document/8751312/ http://dl.acm.org/citation.cfm?doid=3190508.3190538 https://dblp.org/rec/conf/uss/SoskaC15 http://www.ssrn.com/abstract=2867344 https://dblp.org/rec/journals/cais/Avital18 https://linkinghub.elsevier.com/retrieve/pii/S0306437919304892 https://papers.ssrn.com/abstract=3310307 http://portal.acm.org/citation.cfm?doid=280324.280330 https://ieeexplore.ieee.org/abstract/document/8711021/ https://dsg.tuwien.ac.at/projects/tast/pub/tast-white-paper-1.pdf

jverbraeken commented 4 years ago

Thanks a lot for the suggestions @synctext @devos50 👍🏻👍🏻 I hadn't heard of Augur yet, very interesting project indeed! Will change my proposed taxonomy a bit to allow for prediction markets. Would be great if you could share your summaries of those papers @devos50! My holiday is over unfortunately, from July onwards I'll dedicate all my time to this thesis project

jverbraeken commented 4 years ago

great taxonomy start! Trustchain is consensus-free, thus no electricity for mining. What are two isolated markets that can be merged into a single infrastructure? Preferably with low competition, high fees, low innovation, and incentives to use your open alternative.

I'd need to look more into that. At first sight, I'd say that Tinder is a great example since they have a near-monopoly, very high fees, and close to no innovation.

devos50 commented 4 years ago

I'd need to look more into that. At first sight, I'd say that Tinder is a great example since they have a near-monopoly, very high fees, and close to no innovation.

See Matchpool: dating on a blockchain

You can collect 'arrows' by proposing love interests. You are rewarded with more arrows as the relationship advances and gets more intimate 😄.

jverbraeken commented 4 years ago

Haha nicee, awesome project 😂😂

synctext commented 4 years ago

Coming 6 weeks:

Universal market research questions from kick-off presentation:

Existing:

jverbraeken commented 4 years ago

image

synctext commented 4 years ago

Please read the current status of our global AI marketplace tooling: https://github.com/Tribler/tribler/files/4929974/Dollynator.-.Final.Report.pdf After contributions by 31 students in various courses over the years, it is quite sophisticated. Bug-free also obviously :wink:

jverbraeken commented 4 years ago
15 additional screenshots ![2 1-Screen 1](https://user-images.githubusercontent.com/15014738/88368770-02d8c700-cd8f-11ea-8fc0-55eb288ce47b.png) ![3 1-Screen 2](https://user-images.githubusercontent.com/15014738/88368771-02d8c700-cd8f-11ea-8e59-6073b5bc0301.png) ![4 1-Screen 5](https://user-images.githubusercontent.com/15014738/88368772-02d8c700-cd8f-11ea-887c-7b0bb6f8fd03.png) ![5 1-Screen 6](https://user-images.githubusercontent.com/15014738/88368774-03715d80-cd8f-11ea-9a85-cc72a7e8421c.png) ![6 1-Screen 3](https://user-images.githubusercontent.com/15014738/88368775-03715d80-cd8f-11ea-9a9a-63181e94a60c.png) ![7 1-Screen 4](https://user-images.githubusercontent.com/15014738/88368776-03715d80-cd8f-11ea-92bc-d1dca724c5ab.png) ![8 1-Screen 7](https://user-images.githubusercontent.com/15014738/88368778-0409f400-cd8f-11ea-8194-bb0d3d59f91f.png) ![9 1-Screen 8](https://user-images.githubusercontent.com/15014738/88368779-0409f400-cd8f-11ea-81b2-495eefbee7ec.png) ![10 1-Screen 9](https://user-images.githubusercontent.com/15014738/88368780-04a28a80-cd8f-11ea-8dbe-a24b6afdecca.png) ![11 1-Screen 14](https://user-images.githubusercontent.com/15014738/88368781-04a28a80-cd8f-11ea-98c2-c2b3726256f8.png) ![12 1-Screen 11](https://user-images.githubusercontent.com/15014738/88368782-04a28a80-cd8f-11ea-9c36-6d881fcfe3af.png) ![13 1-Screen 12](https://user-images.githubusercontent.com/15014738/88368783-053b2100-cd8f-11ea-9890-1ff36fb36625.png) ![14 1-Screen 13](https://user-images.githubusercontent.com/15014738/88368784-053b2100-cd8f-11ea-96d0-49e9dfa0e773.png) ![15 1-Screen 15](https://user-images.githubusercontent.com/15014738/88368785-053b2100-cd8f-11ea-996e-13f6b688e5cc.png) ![16 1-Screen 16](https://user-images.githubusercontent.com/15014738/88368786-05d3b780-cd8f-11ea-99fd-376eb74d2461.png)
synctext commented 4 years ago
devos50 commented 4 years ago

Some (peer-reviewed) work related to your ideas: Domain Ontology for Digital Marketplaces.

jverbraeken commented 4 years ago

image

synctext commented 4 years ago

Ideas for thesis focus:

jverbraeken commented 4 years ago

We choose to focus first on federated learning (instead of distributed learning), because this field is less researched and poses more significant challenges regarding incentives/privacy/security. When this works, extending the functionality to support distributed learning is fairly trivial.

Some literature: image

Must-haves

Step 1

Create IPv8 overlay in SuperApp and let it run on a few servers, see:

Step 2

Successfully train a CNN on the MNIST database on a single node

Step 3

Implement most basic gossiping protocol for distributed CNN-training: https://arxiv.org/pdf/1611.04581.pdf

Should-haves

Step 4

Prevent attacks (sybil + injection)

  1. Strong identities by giving people €0,01 when they use the app and hashing their bank account
  2. Peers select models by (1) taking a sample of models based on the reputation of the peers that created them, and (2) test their performance on their local data, take the 3 best performing models, and take the average of them
  3. Letting everyone verify if contribution of a random other person improves the accuracy based on his/her own data => put this on the blockchain (RONI)
    • Create block type II: blk_reputation
    • Establish reputation based on KRUM, median, FoolsGold, ...
    • Problem: doesn't work well for non-IID environments, see: https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=9066920
    • Put most emphasis on most recent interactions in calculating someone's reputation
    • Incentive mechanism: limit functionality of app when people don't train the model
    • Reputation calculation also based on how well a peer evaluates other peers

Literature:

Step 5

Use differentially private noise to enhance privacy

Could-haves

Step 6

Replace SGD (that everyone uses, why???) with something way better https://ruder.io/optimizing-gradient-descent/index.html#whichoptimizertouse

Step 7

Develop a general supply/demand platform where people can request other people to train their models, people choose which models they want to train

Step 8

Take the average of lots of models in the early stages of the network (initially large fluctuations) and phase this averaging out when the model eventually converges

jverbraeken commented 4 years ago

Research questions:

  1. How can we prevent sybil/injection attacks in a decentralized federated learning system?
  2. Practically all papers use SGD to train decentralized neural networks; would better parameter optimization methods also work well in decentralized settings?
  3. Can we replace a blockchain to store the model parameters with an eventually consistent system (trustchain) to achieve superior performance?
synctext commented 4 years ago

Now it is really secure :laughing: ; key thesis selling point. Key related work: Biscotti.

Concrete thesis direction proposal: federated learning on edge devices is a new emerging field with significant promise, but numerous unsolved challenges around privacy, security, and incentives. This thesis will significantly address the state-of-the-art by demonstrating a performance leap beyond classical Gradient Decent, security advancement without reliance on external trusted third parties or services, address the privacy issue without introducing brittle zero-knowledge proofs, and provide incentive alignment. Our solution relies on secured gossip to address model pollution and Sybil attack by providing easy-to-use trust framework which utilises Internet latency diversity, graph analysis of user activity, and transaction properties for attack-resilience.

Concrete goal: in 6 weeks a "Poisoning attack" (Wednesday 21 Oct). Possible future sprints past this date:

jverbraeken commented 4 years ago

Update September 23

  1. Created new superapp module
  2. Ping-pong between 2 devices
  3. Train MNIST dataset on single phone
  4. Simulate distributed MNIST training on single phone
  5. Create an faster TFTP-alternative by sending ~50 messages simultaneously (speedup: ~11x faster)
  6. Implement weighted averaging
  7. Train MNIST dataset distributed on 2 phones
synctext commented 4 years ago
synctext commented 4 years ago

Please use the highly cited more modern HAL dataset, instead of MNIST. Deep Learning Towards Mobile Applications

jverbraeken commented 3 years ago

Update October 7 screenshot

  1. Implement UTP library (speedup ~9x faster)
  2. Add CIFAR-10 and UCI-HAR dataset + well-performing neural networks
  3. Redesign UI
  4. Store relevant training information in custom log files that are easily accessible for the user
  5. Compress model updates before sending them over the network
  6. Enable the user to show/hide log entries from specific classes

Issue Simultaneously receiving/sending packets sometimes crashes => packets not always received by the listener

Links https://github.com/jverbraeken/trustchain-superapp https://github.com/jverbraeken/kotlin-ipv8 https://1drv.ms/u/s!AvNMRY4ml2WPgZhA-uL0CvZys_FopA?e=1tEZyo

synctext commented 3 years ago

Great progress again!

Red warning signs appear when a field produces a Survey-of-Surveys. Solid paper actually with 205 citations around Mobile Edge Computing (MEC). image

jverbraeken commented 3 years ago

Update October 22

  1. Solved the bug in UTP => sending and receiving simultaneously works and is stable
  2. Configure and start machine learning task directly from command line
  3. Thorough literature research
  4. Started on literature review / thesis document (see https://1drv.ms/w/s!AvNMRY4ml2WPgZhhnJnQjTpIwT-5vw?e=WOJUlo)
  5. Further delineating of thesis topic: "Byzantine-resilient, asynchronous, yet decentralized federated learning"

Main idea: Improve current state-of-the-art byzantine defense algorithm Mozi, see https://www.groundai.com/project/towards-byzantine-resilient-learning-in-decentralized-systems/

  1. Main innovation 1: almost all current methods require a (significant) set of incoming samples to be able to take the average or median or something similar. I want to use a "moving window kind of thing" to let the agent update its model immediately when a new samples arrives
  2. Main innovation 2: almost all papers use simple averaging; I want to use weighted averaging based on (a) how well a model scores and (b) how frequent a peer has already sent data (frequency filter, see https://arxiv.org/pdf/1802.07928.pdf)
  3. Main innovation 3: almost all current algorithms require i.i.d. data. I want to loosen this unrealistic assumption (agent A has classes X and Y, agent B has classes Y and Z) and allow for non-i.i.d. data by sending the updated model only to nodes that are quite similar in terms of private data. I will do this by checking the cardinality of the private set intersection between the data labels of both nodes.
  4. Evaluate the impact of a warm-start
  5. Usage of ring communication (and compare this approach with random peers, close neighbors, etc.)
  6. Be the first paper in this field to distinguish between exploration (asking other peers to send you their model) and exploitation (improving your own model), and formalizing the trade-off between these things
  7. (Comparison with improved BRIDGE (BRIDGE cuts off the top #maxbyzantine and bottom #maxbyzantine incoming values. I reduce this 2b constraint to b by searching for the "most coherent" logical sequence of data (bit like Minimum-Diameter Averaging)))
  8. (Maybe something with PCA to filter out byzantine values (probably doesn't work because there are many (!!!) more dimensions than training samples))
  9. (Maybe something with RANSAC, particle filters, iterated/iterative filtering, Monte Carlo, ... These methods might also be useful to filter out byzantine values)

Other idea: (I think this approach yield worse performance than the former, because nodes can abuse their reputation, can make use of the fact that they don't have a reputation yet, can assign excellent reputation to malicious nodes or the other way around. Additionally, it still depends on an algorithm that combines incoming models (which should be, in my opinion, the algorithm described above) Trustchain-enabled reputation-assisted federated learning

  1. First, every node receives models from other nodes + half blocks with the hash of the model signed by the sender
  2. Node checks reliability of peers by checking the trustchain
  3. Then the node tests every model with his own local data and puts on the trustchain another half block including the accuracy
  4. The best performing models are combined with the BRIDGE algorithm and then further trained with the node's local data samples
  5. The node tests if this final model performs better than his old model
  6. If new model performs worse, throw it away
  7. If new model performs better, send it to other nodes along with a half block containing the hash of the new model, signed with the node's private key

Question to Martijn/Johan:

synctext commented 3 years ago
devos50 commented 3 years ago

I've read the MOZI paper in detail to get a better understanding of the field and techniques used.

I see how their approach is novel (combining distance-based and performances-based vector selection), however, I am slightly confused by the experimental results. Figure 6 shows that MOZI is indeed resilient against common attacks, but not more significantly than DBulyan. In fact, MOZI sometimes exhibits less accuracy than other aggregation rules, e.g., simple average, even under the sophisticated attack (Figure 6c). The "average" strategy actually seems to perform very well in most cases, and shows decent convergence. The authors claim that "the advantage of MOZI over other strategies is obvious" but I strongly disagree with this claim. Figure 7 shows "average" is also very performant compared to MOZI. What are then the main reasons why one should pick MOZI? Am I missing something here?

Some other comments/questions:

jverbraeken commented 3 years ago

WIP update November 9

  1. Fixed several super nasty bugs with UTP (ConcurrentHashMap instead of regular HashMap, Dispatchers.IO context for coroutine instead of default context, incorrect timing between threads, etc.)
  2. Discovered that restarting my laptop when packages are not being delivered anymore often magically solves the problem and endless hours of debugging
  3. Sending and receiving UTP from/to many peers simultaneously works correctly
  4. Created batch script / Visual Basic file to redirect localhost ports to the emulator ports + changed source code of android-ipv8 to send updates to peers with the same IP-address to localhost instead of the network
  5. Everything running stable on 4 emulators (together with Android Studio consuming 27GB of RAM)
    • However, only with the tiny MNIST dataset to prevent out-of-memory exceptions
    • More emulators gets really slow
  6. Create for every peer a random (seed=port) training/test dataset with a user-configurable number of samples per class
  7. User can select communication pattern
    • Communicate updates to everyone (too slow, discommend using this option at the moment)
    • Communicate updates to random peer (during my first trial run this resulted in the sequence: 59235, 59243, 59243, 59235, 59235, 59235, 59235, 59235, 59243. This is random, but clearly suboptimal)
    • Communicate updates in a round-robin schedule
    • Communicates updates in a ring schedule (e.g. peer 1, 2, 4, 8, 16, 32, ..., 1, 2, 4, ...)
  8. Implemented Coordinate-Wise Median, Trimmed Mean, Krum, BRIDGE, and MOZI
  9. Implemented 3 behaviors (including 2 data poisoning attacks): benign, noise attack, label-flip attack
  10. Implemented 3 model poisoning attacks (Fang, 2020): https://www.usenix.org/system/files/sec20summer_fang_prepub.pdf
  11. Full redesign of the UI
    • newui

Also discovered some weird quirk in MOZI: when all other peers have a model that results in worse performance than the peer's own model, the peer will still average its own model with the model of the best other peer.

jverbraeken commented 3 years ago

@devos50 Thanks for your comment!

I see how their approach is novel (combining distance-based and performances-based vector selection), however, I am slightly confused by the experimental results. Figure 6 shows that MOZI is indeed resilient against common attacks, but not more significantly than DBulyan. In fact, MOZI sometimes exhibits less accuracy than other aggregation rules, e.g., simple average, even under the sophisticated attack (Figure 6c). The "average" strategy actually seems to perform very well in most cases, and shows decent convergence. The authors claim that "the advantage of MOZI over other strategies is obvious" but I strongly disagree with this claim. Figure 7 shows "average" is also very performant compared to MOZI. What are then the main reasons why one should pick MOZI? Am I missing something here?

Well, simple average results indeed in better performance than MOZI, but the authors mention that "For baseline, we consider the same decentralized system configuration without Byzantine nodes, and using the Average Aggregation rule (Equation 2). The model trained from this setting can be regarded as the optimal one."

The paper mentions two use-cases for decentralized learning: Internet-of-Vehicle and recommendations in social networks. The latter use-case is close to what we aim to do in our lab, also see #3752. There should be another open issue for music recommendation somewhere. The evaluation of the paper is not tied to a specific use-case, which potentially undermines their results since it is not a realistic experimental setup with respect to an application domain. You should keep this in mind for your own evaluation.

Thanks! I'll keep that in mind 👍🏻. The HAR dataset (that is implemented in the current version) is a pretty realistic use-case I think.

I noticed that the paper assumes that network messages do not get lost. Their Byzantine behaviour only considers sending carefully-crafted estimates to neighbours.

Mmh, yeah that's correct, and currently the same on my setup because messages don't get lost between emulators since all network traffic is on the same computer. However, I think it's reasonable to make the attackers as strong as possible to simulate a worst-case scenario (attackers are weaker when their packets get lost on the way to the benign node).

As we also discussed last week, a key design decision of your system is synchronous learning vs asynchronous learning. The MOZI authors opt for synchronous learning.

Exactly! Asynchronous learning with lots of nodes that are significantly lagging behind is much more challenging that synchronous learning and has received relatively little attention from the scientific community. The authors also assume an i.i.d. dataset instead of a non-i.i.d. dataset which makes everything a lot easier. I think I'll focus first on tackling the non-i.i.d. challenge (since the i.i.d. assumption is completely ridiculous in real-life) and maybe in a later stage focus on making the algorithm work in asynchronous settings (because synchronous settings are actually realistic in real-life; you can "simply" synchronize every day with the other nodes)

Why is it not realistic to add an additional validation dataset to a centralised parameter server?

Good question! It is very realistic and there is a lot of literature using this approach. However, it can be hard to get enough data because the data might be very privacy-sensitive (e.g. Whatsapp messages, Tinder recommendations, or photos). Additionally, unless you get all data from all users (in which case federated learning doesn't make sense) you will have a lot more validation data when users validate incoming updates themselves (this doesn't always result in better performance because the validation data is scattered across all users, but hey, topic for future research).

  • It was not clear to me how the "connection ratio" is defined. Was this described in the paper?

No, the authors forgot to mention that. I think that connection ratio refers to the fraction of nodes with which every node communicates its updates

devos50 commented 3 years ago

Possibly related work:

synctext commented 3 years ago
jverbraeken commented 3 years ago

Minor update:

Structure of thesis

thesisstructure

Results

(I'm not sure if this is the way to go... There are a lot of different variables that can be combined in a lot of different ways...) results

Pseudocode

pseudocode

Pseudocode old version, you can skip this, perhaps slightly more readable than the formal version

Main:

similarPeers <- getSimilarPeers()
network = createNetwork()
trainDataSetIterator, testDataSetIterator = createIterators()

repeat:
    network.fit(trainDataSetIterator.nextBatch()
    if models received from other peers:
        averageParams = integrateParameters(network)
        network.setParameters(averageParams)
        recentOtherModels.add(newOtherModels)
        limit recentOtherModels to the 20 most recent models
    endif
end repeat

integrateParameters:

distances = euclideanDistance(ownModel, newOtherModels, recentOtherModels)
exploitationModels = first X models with the smallest distance
Exclude non-new models from exploitationModels
explorationModels = from (distances \ exploitationModels \ recentOtherModels), randomly select Y models
combinedModels = exploitationModels ∪ explorationModels
calculate for the peer's own model per class the loss on a random data sample
calculate for all combinedModels the loss on the same random data sample
calculate weight for every combined model
set new model to weighted average of combined models

calculateWeights

for every peer:
    C = cardinality of PSI with peer
    bestClasses = the C classes with the lowest loss for the other peer
    smallestOtherLoss = sum of the losses of the other peer for bestClasses
    smallestOwnLoss = sum of the losses of self for bestClasses
    weight = max(0, 8 - 4 * (smallestOtherLoss / smallestOwnLoss)

getSimilarPeers: psi

synctext commented 3 years ago

:clap: :fireworks: :clap: You have reached the milestone of a thesis conceptual outline! (towards end-March) Now I would recommend keeping focus on getting state-of-the-art results, and possibly tweak until you get solid results. No gap between simulation and real-world ML. practical stuff on port forwarding emulators

Key focus: how to sell your idea

Experiments:

jverbraeken commented 3 years ago

Probleem: Mijn oplossing werkt erg slecht als de nodes hun onvolledige neural networks moeten combineren tot 1 gezamenlijk neural network. Hoe kan dit worden opgelost? Door gebruik te maken van een geavanceerde techniek genaamd Elastic Weight Consolidation. Ik heb iets van 2 weken eraan zitten werken, maar deeplearning4j (de enige machine learning library die werkt op Java) is totaal niet geschikt voor het implementeren van deze techniek. Tensorflow is hier véél geschikter voor, maar werkt alleen op Python (helaas zijn Android apps altijd in Java). Wat zijn de mogelijkheden?

  1. Geen Android app maken maar een normale Windows/Linux app (maar dan gooi ik wel effectief maanden aan werk weg, aangezien veel tijd zat in code specifiek in de superapp)
  2. Mijn algoritme niet geschikt maken voor non-i.i.d. data (wat nou juist de belangrijkste innovatie was van het hele algoritme)
  3. Gebruik maken van een andere techniek. Hierbij moet ik veronderstellen dat een klein gedeelte van de training data wel degelijk publiek beschikbaar is (redelijke veronderstelling, maar toch jammer dat ik deze moet maken). Deze techniek is bovendien minder effectief. Ik kan wel eventueel daarnaast in Python wel de effectiviteit van Elastic Weight Consolidation laten zien, en dan in mijn thesis beargumenteren dat zodra Tensorflow ook op Android werkt we dan alle reden hebben om aan te nemen dat met Elastic Weight Consolidation mijn algoritme nog een stuk effectiever is.

De laatste optie heeft momenteel mijn voorkeur.

jverbraeken commented 3 years ago

Update December 12

  1. Researched EWC for 2 weeks, but this technique is with the current libraries available for Android software development infeasible (see post above).
  2. Automated generation of all figures with one click of a button
  3. Re-added CIFAR-10 and HAR to the available datasets (needed to be rewritten so that we can get more fine-grained insight into their performance)

Specific scenario's in which my algorithm will probably outperform other algorithms

  1. When our node generates very litle data while all other nodes generate tons of data
  2. When (for example due to high bandwidth costs) other nodes only send very seldom an update of their parameters
  3. When our node joins in quite late, when all other nodes already have much better performance
  4. When the nodes have data of different classes (with some overlap in-between)

Other announcement:

  1. A few weeks ago I was pretty busy at De Kleine Consultant (consultancy organization in Delft), so I could spend less time on my thesis. Mid-January I'll manage a team as a Project Manager which will take ~15-20 hours per week. That is why I plan to finish my thesis probably a month later (end of April).
synctext commented 3 years ago

ToDo for next meeting:

jverbraeken commented 3 years ago

Update January

  1. Simulation results ready (results below are slightly outdated, fixed several bugs in the meantime)
  2. Huge performance improvements (+- 24h => +- 6h to run the entire simulation)
  3. Bug fixing in TFTP/UTP => fully distributed environment working
  4. Cannot use the emulators anymore :(:(:(

Figure 0 1 Figure 4 2 Figure 4 1 Figure 3 4 Figure 3 3 Figure 3 2 Figure 3 1 Figure 2 3 Figure 2 2 Figure 2 1 Figure 1 3 Figure 1 2 Figure 1 1 Figure 0 4 Figure 0 3 Figure 0 2

synctext commented 3 years ago
jverbraeken commented 3 years ago

Update January 27th

  1. Finally got it working on the Zulu server => first results with 50 nodes (note: training takes 24 hours for test with 50 nodes, undetermined but significantly longer time for most complex model with 250 nodes). Unfortunately, the results don't seem to make sense, so have to do some bug fixing...
  2. Also finished first distributed test with the maximum number of 16 nodes. Performance is quite a bit lower than in the non-distributed setting. Again lots of debugging needed
  3. Finished significant part of my actual thesis
  4. Determined research direction for coming weeks: achieving good performance in non-i.i.d. settings. A trivial rehearsal strategy yields okay-ish results, but the performance is not excellent. Techniques such as Elastic Weight Consolidation, SI, AR1, or AR* are all dependent on advanced custom loss functions (something that the library DeepLearning4j does not support :(:(:( ). However, it would very interesting to investigate other methods such as CWR+ and Generative Replay. These performance of these methods has never been investigated in a federated setting before and should significantly better performance than the methods currently being used (namely Rehearsal-based methods and EWC)

Thesis.pdf

synctext commented 3 years ago
jverbraeken commented 3 years ago

Update February 29th

https://www.tudelft.nl/en/student/eemcs-student-portal/education/graduation-msc/green-light

Untitled

Picture1

jverbraeken commented 3 years ago

Thesis.pdf

jverbraeken commented 3 years ago

 Figure MaxBound WISDM _ noniid - bound  Figure MaxBound WISDM _ noniid - regular  Figure MaxBound WISDM _ noniid - transfer  Figure MinBound CIFAR-10 _ noniid - bound  Figure MinBound CIFAR-10 _ noniid - regular  Figure MinBound CIFAR-10 _ noniid - transfer  Figure MinBound MNIST _ noniid - bound  Figure MinBound MNIST _ noniid - regular  Figure MinBound MNIST _ noniid - transfer  Figure MinBound WISDM _ noniid - bound  Figure MinBound WISDM _ noniid - regular  Figure MinBound WISDM _ noniid - transfer  Figure 0 1 - regular  Figure 0 1 - transfer  Figure 0 2 - regular  Figure 0 2 - transfer  Figure 0 3 - regular  Figure 0 3 - transfer  Figure 1 1 - regular  Figure 1 1 - transfer  Figure 1 2 - regular  Figure 1 2 - transfer  Figure 1 3 - regular  Figure 1 3 - transfer  Figure 2 1 - regular  Figure 2 1 - transfer  Figure 2 2 - regular  Figure 2 2 - transfer  Figure 2 3 - regular  Figure 2 3 - transfer  Figure 3 2 - regular  Figure 3 2 - transfer  Figure 3 3 - regular  Figure 3 3 - transfer  Figure 4 2 - regular  Figure 4 2 - transfer  Figure 4 3 - regular  Figure 4 3 - transfer  Figure 4 4 - regular  Figure 4 4 - transfer  Figure 4 5 - regular  Figure 4 5 - transfer  Figure 5 1 - regular  Figure 5 1 - transfer  Figure 5 2 - regular  Figure 5 2 - transfer  Figure 5 3 - regular  Figure 5 3 - transfer  Figure MaxBound CIFAR-10 _ noniid - bound  Figure MaxBound CIFAR-10 _ noniid - regular  Figure MaxBound CIFAR-10 _ noniid - transfer  Figure MaxBound MNIST _ noniid - bound  Figure MaxBound MNIST _ noniid - regular  Figure MaxBound MNIST _ noniid - transfer Thesis.pdf

synctext commented 3 years ago
devos50 commented 3 years ago

I did a high-level screening of your thesis. Great work and good content! Below you can find my comments:

Overall major feedback:

Overall minor feedback:

Subtitle:

Intro:

Chapter 2:

Chapter 3:

Chapter 4:

Chapter 5: