Tribler / tribler

Privacy enhanced BitTorrent client with P2P content discovery
https://www.tribler.org
GNU General Public License v3.0
4.86k stars 450 forks source link

msc placeholder: 5G overlay infrastructure for decentralised learning ??? #7258

Open synctext opened 1 year ago

synctext commented 1 year ago

Thesis defense target: 21 June 2024. Survey target: end of July 2023. Would like to have a fresh master thesis topic, not incremental improvement of other thesis work. Starting roughly Q1 2023 or summer of 2023, flexible. update: starting lit. survey 2nd May update 2: literature survey finished: 3 oct 2023.

RTOS expertise. AWS. Dream of contributing to The Linux Kernel. Byte-level stuff OK, even assembly person in the age of Javascript :-) Like to use machine learning, but not invent new ML stuff or central focus of thesis (no unsupervised learning, no online learning). Thus more ML that is: adversarial, byzantine, decentralised, personalised, local-first AI, edge-devices only, low-power hardware accelerated. Prefer to utilise advanced algorithms msc course knowledge.

Possible brainstorm starting idea: start building the fastest machine learning based on hardware acceleration. First step is get the hardware running fast, stepwise modify algorithms and tweak towards machine learning for learn-to-rank, learn-through-consumption, or even learn-about-trust (reputation graph, work graph, MeritRank inspired etc). Promised phones to test.

synctext commented 1 year ago

Concrete idea for NAT survey

This survey describes the progress in the field of an Internet which is fully connected, currently mobile devices are not fully participating within the network. Smartphones are unable to receive message from others. Only Facebook, Google, and other servers in the cloud are able to communicate with billions of smartphone users. In the name of security billions of users have a constrained network, without freedom to communicate.

Find on scholar Year scientific article or report
2000 SIP, NAT, and Firewalls - master thesis KTH
2003 Network convergence and the NAT/Firewall problems
2005 Characterization and measurement of tcp traversal through nats and firewalls
2006 Implementation and performance study of a new NAT/firewall signaling protocol
2008 A Better Approach than Carrier-Grade-NAT
2008 Free-riding, fairness, and firewalls in p2p file-sharing
2009 A measurement of NAT and firewall characteristics in peer-to-peer systems
2011 Delft work UDP NAT and Firewall Puncturing in the Wild
2011 Tribler: P2p media search and sharing
2013 Assessing the impact of carrier-grade NAT on network applications
2013 Common requirements for carrier-grade NATs (CGNs)
2013 A Royal Opinion on Carrier Grade NATs
2013 BT Retail Tests IP Address Sharing
2014 On the performance and fairness of BitTorrent-like data swarming systems with NAT devices
2014 Deterministic Address Mapping to Reduce Logging in Carrier-Grade NAT Deployments
2016 Carrier-grade NAT—is it really secure for customers? A test on a Turkish service provider
2016 A multi-perspective analysis of carrier-grade NAT deployment
2016 Statistical network monitoring: Methodology and application to carrier-grade NAT
2016 Overudp: Tunneling transport layer protocols in udp for p2p application of ipv4
2018 Inferring carrier-grade NAT deployment in the wild
2018 IETF Internet Standard draft on Trustchain
2020 birthday paradox solution https://tailscale.com/blog/how-nat-traversal-works/
2020 https://github.com/danderson/nat-birthday-paradox
2021 A QUIC (K) Way Through Your Firewall?
2021 Hardware details, Fortigate: https://news.ycombinator.com/item?id=27489797
2022 How NAT traversal works — NAT notes for nerds
2023 Doomed to Repeat with IPv6? Characterization of NAT-centric Security in SOHO Routers

Taken from the master thesis of 2000: image

ToDo1: 30 citations to carrier grade NAT, and all these topics. ToDo2: taxonomy list, https://www.rfc-editor.org/rfc/rfc3234

Finally, we investigated various telecom providers in The Netherlands about their NAT and blocking practices. We procured 12 SIM cards and measured their behavior. See full connectivity matrix of Sim-to-Sim card. Only 3 offer free Internet... {ToDo}.

TODO: register at https://mare.ewi.tudelft.nl/project :memo:

OrestisKan commented 1 year ago

@synctext registration is for the thesis not for the literature survey no ?

synctext commented 1 year ago

indeed, it's nice if you register your thesis as early as possible.

OrestisKan commented 1 year ago

Literature_Survey.pdf

synctext commented 1 year ago

Feel free to add a bit more content on reproducing state-of-the-art literature.

scientific problem of universal connectivity is not explained clearly. Storyline goes too fast, page 2 already has "port-restricted cone NAT". Take .5 page for a tutorial on the concept of an incoming connection. Need structure!

Section 5. Reproducing results from literature After presenting the relevant 34 prior works of covered in this survey we now combine the state-of-the-art results. Using a practical experimental evaluation we reproduced the best-of-class algorithms presented in the discovered literature. We confirmed the findings of the body of literature within our reproduction experiment. Our simple app reproduces the NAT penetration algorithms of the main literature [2,5,17]. Cardinal outcome of our experimental CGNAT evaluation is the success rate, something often lacking in studies. The success rate for various Dutch telecom providers is determined to be: 97%. ETC.

EDIT: brainstorm about master thesis focus. Idea for title: "5G overlay infrastructure for edge-based decentralised learning". Context to sell your _perfectoverlay effort. Only need a few weeks doing a minimal-viable-product of decentralised machine learning. Simply take this gossip-based ML algorithm and running code. Goal: 100 actual nodes {mixed real ARM Android and x86 Kotlin}!

OrestisKan commented 1 year ago
  1. Polish text
  2. Create taxonomy table with the literature (from the literature survey from other student): https://arxiv.org/abs/2212.06436
  3. Create an app and test penetration-rate using ipv8-kotlin
synctext commented 1 year ago

MARE: "5G overlay infrastructure for decentralised learning"

Update:

synctext commented 1 year ago

Goal: mechanism for one phone to help another phone to puncture their carrier-grade NAT.

OrestisKan commented 1 year ago

Literature_Survey.pdf

synctext commented 1 year ago

Nearly done with the Lit Survey. [38] citations to forums and scientific papers. Great result to include: puncure log Just put as-simple-as-possible description of the SIM cards from 6 different 4G/5G providers.

Research Assistant job: send 50 UDP packets, count how many arrive. Repeat for all SIM-card combinations. Test the performance of EVA, note that you can then quickly run out of your 100-ish MByte SIM data quota. Read from Rahim on the binary transport protocol called EVA. See some example code: https://github.com/KoningR/eurotoken/blob/5c84348ba16dd9ce4b97e53ff52a5cefe9ee97c1/src/main/kotlin/evatest/EvaApplication.kt

OrestisKan commented 1 year ago

Literature_Survey (2).pdf

Lyca is symmetric NAT, the rest (Lebara, TMOBILE and vodaphone) could cross communicate while they all failed with Lyca ( even Lyca to Lyca communication failed). Theoretically with Birthday paradox Lyca to Lyca communication may be achieved. We need to determine the address and port predictability in order to understand how long it would take for the NAT to be penetrated and how long it would take for Lyca to block the requests

Willingess to travel (and I have accommodation maybe?)

Reason for traveling: Live physical testing 4g and G5 communications and procurement of SIM cards

Research assistantship ending 30/09/23!

synctext commented 1 year ago
OrestisKan commented 1 year ago

Final Literature Survey with the suggested improvements Literature_Survey.pdf

synctext commented 1 year ago

Comments on this latest survey:

OrestisKan commented 1 year ago

Literature_Survey (1).pdf Latest (hopefully final) version with all the suggestions for improvements that you requested

OrestisKan commented 1 year ago

@synctext birthday attack between phone running on Vodaphone5g and emulator running in eduroam wifi worked and they managed to connect, still needs optimizations cause its heavy etc but at least we know it works! More details in my Slack message

whats left to do:

OrestisKan commented 1 year ago

Literature_Survey (3).pdf

synctext commented 1 year ago

Solid progress! Survey completed, now ready for Arxiv submission. Thesis brainstorm: link the TensorFlow Light which Quinten van Es got operational to birthday attack. get healthy IPv8 overlay. focus on binary transfer for "decentralised Artificial Intelligence". Fix the "information diffusion problem". measure UDP bandwidth throughput. EVA protocol also: this whole issue warning bad code :mask: Determine bottleneck. Improve. Write thesis DONE!!!

Improve activity grid principle of status of each of the 25 connected IPv8 peers. image

Related IPFS work: https://github.com/plprobelab/network-measurements/blob/master/results/rfm15-nat-hole-punching.md The measurement was designed to provide insights into when and why the DCUtR protocol fails in NAT hole punching and to provide recommendations for improvement. In total, we tracked 6.25M hole punches from 212 clients (API keys). The clients were deployed in 39 different countries and hole punched remote peers in 167 different countries. Our top findings were that: libp2p’s hole punching success rate is around 70%. https://research.protocol.ai/publications/decentralized-hole-punching/

OrestisKan commented 1 year ago

THESIS TITLE (draft): First 5G deployment of Distributed Artificial Intelligence

IEEE_Conference_Template.pdf

Measure: UDP bandwidth, bottlenecks, timeouts on Android client and NATs, connection reset time and port association time, all possible conditions that make successful communication possible and complete understanding of all possible factors that cause a communication failure. Determine if there is an upper bound to the number of concurrent IPs that a device can talk to(e.g. 63 works and adding a 64th may break the least recently used).

Reliable data transfer: compare UDP and EVA protocol in terms of effective throughput, packet loss, congestion

Measure the exact NAT behaviour!

Measure NAT hole opening time!

I have operational 10 or 12 sim cards. I have two phones, hence I can use 2 sim cards at the time

synctext commented 1 year ago

update "This is brute forcing the public IP"{+port}, nice and sharp description somebody from Canada gave your work.

OrestisKan commented 1 year ago

SURVEY to be announced by Arxiv tomorrow I added tests for:

TODO:

Goal by Christmas:

  1. Quantify all measurements for the simcards
  2. Integrate the Birthday Attack in Ipv8
OrestisKan commented 1 year ago

Lit Survey is published: https://arxiv.org/abs/2311.04658

Edited to fix the broken reference link

OrestisKan commented 1 year ago

I HAVE CODE FOR:

Measuring:

synctext commented 12 months ago
OrestisKan commented 11 months ago

The Github repo of the research

View data gathering progress in this google sheet

OrestisKan commented 11 months ago

The first result are that the success rate of birthday attack is low and very dependent on the provided as can be seen [here]https://docs.google.com/spreadsheets/d/1hmGZ38y3Cngt8hsbJbR7SoZpRnAUu7uKivV9ODkhKSs/edit?usp=sharingl). I propose to gather data on the mapping of the NAT. A server listens to incoming packets from a phone and logs the return address:port, while the phone does the same (logging the address:port that it sent the packet from). The results can then be compared and we can reverse engineer the mapping function of each NAT. This can be used to reduce the collision space (now 65535^2). According to RFC 4787 the NAT mapping protocol has different behaviour on different ranges, hence identifying the "convenient" ranges for each carrier will allow us to reduce the collision space and increase the connectivity rate!

synctext commented 11 months ago

Idea of a "biased birthday attack" if you know the port-range, behaviour of used 4G/5G provider, or even the mapping function itself (trivial +1 counter). Portugal and Greek SIM card also probably going to be procured. Six years ago the superapp would show "NL KPN" for people you are connected to.

OrestisKan commented 10 months ago

Update On data gathering: Android app that will spam the server is ready. The server was very hard to do because of the 65k simultaneous processes and I managed to run 30k ports yesterday successfully going beyond 30k throws an exception since it runs out of memory, on a machine with 16GB RAM so I emailed Sandip yesterday if he could give me a 64GB server, still waiting for reply

My idea is to change the Birthday Attack based on the data gathered hoping to improve it. Then that repo will become a generic birthday attack public library for android connectivity that will be published in Gradle.

There's no plan to use IPv8 as dependency Note: buying sim cards from random countries is useless without physically using them in the country with the local network In the next 3 weeks plan:

synctext commented 10 months ago

@Apple1D Indeed, strong authentication and identity management stuff is ignored by computer science for too long. Also no industry support, as it's not a golden money maker. Governments have decades of failures and many losses trying to craft this societal infrastructure. See our scientific analysis: https://arxiv.org/pdf/2401.05239.pdf

synctext commented 10 months ago
Please keep track of your planning. In Feb 2023 we need to do your master thesis progress moment Date Milestone
20 Sep 2023 First ever successful Vodaphone 5G birthday attack :clap: :tada:
Nov 2023 first code for UDP bandwidth, port association, ping time, etc
Dec 2023 understanding of NAT mapping
Jan 2024 generic library
Feb-May 2024 4G/5G measurement inside various EU countries
Feb-March 2024 integrate with Superapp + fix EVA binary transport
March 2024 finish writing Introduction + Problem description chapters
April 2024 integrate distributed machine learning: #7254
May 2024 Do experiments + finish writing experimental section thesis
1 June 2024 Thesis done
13 June 2024 Tentative Graduation Date :boom:
OrestisKan commented 10 months ago

Updates on Lebara Research:

A single run looks like this: From any port goes to some specific "buckets" or ranges of ports, as shown WhatsApp Image 2024-02-01 at 16 41 46

The problem is that these buckets are not consistent across runs, and they change based on timeouts of the port, the number of requests, which again is not consistent (after analysis)

What we know for sure:

Breakdown follows:

Ranges of ports that were never mapped

[(0, 1023), (19200, 19711), (40959, 41471), (49408, 49919), (60918, 65535)] larger scatter plot (1)

Frequently Observed Ranges

The ranges and their percentage frequency:

[35328, 35583] -> 0.4762 [9216, 9471] -> 0.2857 [55040, 55295] -> 0.2857 [15616, 15871] -> 0.2381 [29440, 29653] -> 0.2381 [43520, 43775] -> 0.2381 [61184, 61439] -> 0.2381

I chose a 35% probability for all other ports and created a function.

def pick_next_port(): list_of_ranges, weights_list = build_ranges_lists_and_weights(ranges_with_occurrence_frequency, reachable_ports) chosen_range = random.choices(list_of_ranges, weights=weights_list)[0] return random.choice(chosen_range) That gives the next port to send to (in this case, the input is Lebara specific, which will soon be based on your receiver's telecom provider).

This function takes 8.606910705566406e-05 seconds to run, so it is fast enough not to interfere with the app's speed (also, when ported to Kotlin, it will be faster).

I will continue to analyze and try to find more relations for now. Unfortunately, the "seed port" choice seems pretty random.

If anyone knows some data scientist/mathematician that can help, that would be great because this is getting out of my area of knowledge

I want also @synctext insights on any Machine learning /statistical approaches because atm it is outside of my realm and I only do random weighted choice on a range

Fallback Mechanism Proposal

Since now its established that NAT picks a random "seed port" and then increments linearly I want to test if the linear incrementation is affected by the IP of the receiver i.e. if each new receiver forces a new random seed port. If not we can utilize a STUN-like server to log the initial seed port and then the other peer will have a starting point.

synctext commented 9 months ago

Solid progress! Please keep full focus on measuring several SIM cards. Afterwards we can exploit them. Easy tricks like one side doing 64k -1 port attempts and remote side doing +1 through SIM NAT? So 1 side uses a carrier-grade NAT with integral +1 algorithm. Our side wants to connect, start with the highest port number and counts down with -1. Without any failure or any timeout they are guaranteed to connect within 64k attempts, usually somewhere in the middle. Symmetric means that the +1 side needs to contain the correct UDP port for the -1 port, then it will "open up" for an incoming packet. Everything needs to match :sob: Working on rate control: 1k packet seems to be max that the server can handle before starting to drop. ToDo later: quantify exact server drop behaviour. behavioural modelling chapter in thesis. Use Markov state-transition model to model the symmetric NAT behaviour please: Lica, T-Mobile (flaky :open_mouth:), Lebara, ??Cyprus??. Debug info for cell tower:

ADDED: 89 characters of base-11?! Mobile networking in rural Ethiopia! by Ben Kuhn. On youtube

OrestisKan commented 9 months ago
synctext commented 9 months ago
synctext commented 8 months ago
OrestisKan commented 8 months ago

Updates 18/03

Roaming update: There are simcards that when you level home nothing changes because you tunnel home (virtually nothing changes) Lyca NL, Lebara NL , MTN CY, and Lebara FR are tested to change the IP while roaming

Check if while roaming it behaves the same as the partner (open research question)

Server right now:

synctext commented 8 months ago

update idea to use more external IPv4 addresses on your server. That means expanding your testing infrastructure with probing from multiple addresses. Can you start measuring for a while from 1 address and predict what the other address will see as port mapping? {hope this is understandable}. If any stranger on the Internet can help you predict you port mapping (or not) you've made scientific progress. Both positive and negative outcome is progress and thesis material. thnx The following 5 IPs are assigned to your server: YYY.ZZZ.119.XXX :

OrestisKan commented 7 months ago

Currently gathered Belgian and Norwegian data for this week and fixed the bugs in the server that was causing it to crash. Updated the Paper with some changes on the measurements used and data gathered.

I believe right now there are good enough number of sim cards in my possession and I'll focus on analyzing the results of this sprint.

Todo:

First_5G_deployment_of_Distributed_Artificial_Intelligence.pdf

Planning to charter a private Piper Aircraft soon to do a sim card run in another EU member state

synctext commented 7 months ago
OrestisKan commented 7 months ago

Updates last sprint:

Updated thesis:

Next Sprint:

synctext commented 7 months ago
OrestisKan commented 7 months ago

Vodafone NL fitted on a beta distribution vodaphone-betta-distribution

OrestisKan commented 6 months ago

Progress update:

shifted focus from buying SIMs to getting the library to work. Library for BirthdayAttack is done and theoretically works on the unit test by testing historical data, and the algorithms for port prediction seem to be an improvement from randomness.

Importing this in an Android app, compiling and running it on the phone stops sending packets around the mark of 29k packets (out of ~250k). NO ERROR, NO CRASHING, stops sending packets (every time a packet is sent, a print message is written).

I bought physical sims from different carriers: 4 from Cyprus, 2 from Romania, and 6 from the UK, bringing the total to 21. Waiting for delivery for sims from Greece, Portugal and possibly Turkey (Turkey is not guaranteed due to extra charges and generally hard to test; maybe I can manage 1)

synctext commented 6 months ago
OrestisKan commented 5 months ago

@synctext that roaming potentially ruins the birthday attack. Foreign (roaming carriers) seem impenetrable (all of them), which is suspicious. Local carriers (Netherlands before cyprus now) seem to be working fine. Currently looking into it but carriers that were easy to penetrate as soon as one is roaming all of a sudden they are not ( even though roaming IP shows IP of the carriers country). Looking into whether the mapping changes while roaming and how behaviour changes

Update Norwegian carrier Telia's Nat mapping timer falls from 300 seconds in norway to just 2 seconds, MyCall to 17. Belgium's Lyca time to leave is so small that it is not even logged

Testing KPN showed no change in timer but after hours of trying to penetrate with no success makes me believe that there is some differences. Same with Vodaphone NL which was very easy to penetrate and now all of a sudden is impossible to get a success

synctext commented 5 months ago
OrestisKan commented 5 months ago

Latest paper with new graphs and all.

Improvement of Birthday Attack seems to be working, waiting for Odido results (~Saturday/Sunday) to quantify by how muchit improves /hypothesis test

TODOs:

First_5G_deployment_of_Distributed_Artificial_Intelligence.pdf

synctext commented 5 months ago
OrestisKan commented 5 months ago

By next meeting:

Least risky 3rd exp section is p2p tiktok

OrestisKan commented 4 months ago

Updated and improved the text of the METHODOLOGY chapter. Made it more clear and explained why each test is useful. Better explained the algorithms and the tests that were to be performed

First_5G_deployment_of_Distributed_Artificial_Intelligence.pdf