Tribler / tribler

Privacy enhanced BitTorrent client with P2P content discovery
https://www.tribler.org
GNU General Public License v3.0
4.77k stars 443 forks source link

the first AI dAPP - self-evolving AI-DAO #5987

Closed synctext closed 2 weeks ago

synctext commented 3 years ago

Master course on Blockchain Engineering project 2021

TEAM1: the first AI-dApp Create a dApp with federated machine learning to understand music taste of user and recommend more. Exchange training vectors on the blockchain overlay. Never share with others what Bittorrent music swarms you like. By only sharing training data you can discover new music and still preserve privacy. Prior work and this.

General Description - self-evolving AI-DAO (https://github.com/Tribler/tribler/issues/5944)

Student from Delft have created a blockchain-based alternative to Spotify. Completely decentralised. Uses Bittorrent for streaming, Bitcoin for payments to artists, Trustchain with IPv8 for music discovery and IPv8 for app-to-app connectivity. With multiple teams the aim is to take this code to the next level: self-evolution.

A total of 4 students teams (4-5 students in each team) will work together on a cutting-edge scientific problem: how to create a software system which can be expanded in real-time and increasingly become more 'intelligent'. Build upon the existing open source app by TUDelft on the Android Play store using blockchain technology: the Superapp. You will help transform essential parts of the music industry and replace them with open source software. Current code:

GIF: Browsing and streaming music with Bittorrent
GIF: Sending money to artists using Bitcoin

DAO - organisations in software

To make a "self-evolving" app we use the DAO concept. What is a DAO? Within the coming decades the future of jobs, employment and the nature of the firm will change profoundly. Automation, AI, and robots will replace many of today's jobs. A new type of company is a company without any employees, without any machines or physical infrastructure. A Decentralized Autonomous Organizations, DAO, only exists in software. It goes beyond smart contracts, it is a complete company inside software. DAO development is still in the experimental stage. Background reading. Very optimistic view on DAO, official US review of DAO by Securities and Exchange Commission.

Within this master course you can create your very own autonomous organisation, the AI-DAO. Learn to engineer a decentralised autonomous organisation, use the existing tools, and understand the security risks. The aim is to alter the nature of the firm in the Internet age, see the Nobel prize winning theory. Production cost become essentially cost-free. An organisation which exists purely in cyberspace. The AI-DAO is designed to be the first sustainable DAO. How can we empower leaderless organizations? How can it earn money from manipulating bits?

Scientific challenge: Self-evolving

A key step in an autonomous system is that it can evolve independently. This enables growth and evolution independently of any central organisation, sponsoring government, or tribe of volunteers.

You will collectively solve the problem of paying somebody to make new features in open systems which are fully decentralised. This goes further then paying somebody Bitcoins to create a new version. Decentralised technology is very robust to failures, manipulation, faults, and courtcases. For instance, The Internet itself is almost impossible to shutdown so is the "Tor darknet". With other teams you will address a key drawback of decentralised technology: difficult to update, nearly impossible to evolve, and lacks incentives to develop new features.

dApp ecosystem

"Distributed Applications" are a distributed way of running code. You will help develop an ecosystem of "global code". Code is running atop a blockchain and peer-to-peer (P2P) network that acts as a kind of operating system. This provides security, resilience, privacy, and novel features. This is related to smart contracts, but has no slow single virtual machine (all discussed in the online classes material). Background material, read FBASE trustworthy code execution

PNG: difference between cloud and decentralised Apps
synctext commented 3 years ago
JCBrouwer commented 3 years ago

Hello world

nata1y commented 3 years ago

Hello world

skullyhoofd commented 3 years ago

Hello world, Ricardo here!

MarkovChainmail commented 3 years ago

hello world, this is esmee, over

nata1y commented 3 years ago

Hi, Is it possible to provide us with a type / format of training data that we can get from Superapp? What are the features we can get and in which format? Do we understand correctly that there is no preprocessing done on user data yet? Thanks in advance!

MarkovChainmail commented 3 years ago

Stuff we need to do/find out:

JCBrouwer commented 3 years ago

Tasks 1) getting music info from the app @ehildebrand 2) finding a dataset 3) feature vectors from music and online svm @nata1y 4) gossiping models @JCBrouwer 5) predictions with model in interface @ehildebrand 6) like button in interface @ehildebrand

JCBrouwer commented 3 years ago

PySyft evaluation

https://github.com/JCBrouwer/fedrecsys

Pros:

Cons:

nata1y commented 3 years ago

A brief overview of current findings: We are going to create a separate Android library dedicated to federated ML. We are going to use Online Learning interface for training models in place from smile Kotlin library (https://haifengl.github.io/quickstart.html). We are going to implement models from the paper: Pegasos svm and adaline. Not clear yet on concrete features encoding/engineering technique, depends on the type of data we can get from the user, to be discussed during the next meeting..

JCBrouwer commented 3 years ago

Some literature

More efficient gossiping

These papers both propose more efficient gossiping by compressing parts of training intelligently. These can be useful to implement almost regardless of the exact recommendation models we choose to use. The second has a high-quality open-source implementation we can use as reference (+ the authors have a slack where they are reachable).

User network based

Once we have basic feature-based recommendations and a like button in the interface, we can think of bootstrapping some network-based collaborative filtering. This can be based on an assembly of info (music features, listening history, and likes) to find similar users so we can recommend based on similarity. The first paper and last 2 papers go over how this can be done in a privacy-preserving way.

Centralized server approaches

Many approaches use a centralized parameter server to keep global updates. It seems plausible that a true central server could be replaced by a parameter server running locally on each device. These parameter servers could then share their weight updates via gossiping.

synctext commented 3 years ago

Possible gameplan for this entire project:

For next sprint: make progress on the above issues and report progress for next meeting.

EDIT: plus this plug-in system for "dApp" approach work fully, but usability is -100%, https://github.com/Tribler/trustchain-superapp/blob/master/freedomOfComputing/README.md Bonus extension: integrate a compiler inside the superapp, to enable source code distribution and compile locally: https://play.google.com/store/apps/details?id=ru.iiec.jvdroid&hl=en_GB&gl=US (less security hazards, source code inspection) EDIT2 (for future reference): Java compiler on Android: https://github.com/t-arn/java-ide-droid, heroic efforts for memory overflow, and https://play.google.com/store/apps/details?id=com.krazeapps.kotlinprogrammingcompiler

drew2a commented 3 years ago

Two suggestions from @drew2a side

Use a separated branch

As we are "production software" now, I suggest performing all the development of the Superapp in a separate branch to prevent interfering with a published code from the master branch.

To do this, we need to create a branch in https://github.com/Tribler/trustchain-superapp (e.g. feature/music-dao-recommendations). Then, all of our commits will go to this branch. At the end of the project, this branch will be merged to master.

If you are not familiar with git, take a look at https://www.gitkraken.com/

Split a network until the project will be done

It will be tough to debug and test a changed application, within the wild network. There are multiple scenarios of how we can make life easier for ourselves.

I propose the following approach:

  1. Deploy a dedicated bootstrap server (or a few). I can help with that.
  2. Change (or override) a list of bootstrap servers (see an example here)

PS

@drew2a we can seed this content for you: 917 GiB and 343 days of Creative Commons-licensed audio from 106,574 tracks from 16,341 artists and 14,854 albums, arranged in a hierarchical taxonomy of 161 genres and expand it with a crawl of 55,000 full audio tracks of Jamendo

Wow :)

JCBrouwer commented 3 years ago

Use a separated branch

As we are "production software" now, I suggest performing all the development of the Superapp in a separate branch to prevent interfering with a published code from the master branch.

Yes we're developing in branches on this fork. We can merge branches from there directly to the upstream master via pull request.

I propose the following approach:

1. Deploy a dedicated bootstrap server (or a few). I can help with that.

2. Change (or override) a list of bootstrap servers (see an example [here](https://github.com/Tribler/kotlin-ipv8/commit/32a7286e20c254bd747be47c063fbcbdfe93d532))

Ok sweet, that sounds good.

@drew2a we can seed this content for you: 917 GiB and 343 days of Creative Commons-licensed audio from 106,574 tracks from 16,341 artists and 14,854 albums, arranged in a hierarchical taxonomy of 161 genres and expand it with a crawl of 55,000 full audio tracks of Jamendo

Who sent you this? That definitely sounds like it can help our issues with amount of data!

drew2a commented 3 years ago

Who sent you this? That definitely sounds like it can help our issues with amount of data!

Johan sent me this

JCBrouwer commented 3 years ago

Who sent you this? That definitely sounds like it can help our issues with amount of data!

Johan sent me this

Ahh I read over it, nice!

MarkovChainmail commented 3 years ago

Andrei told me you can find the data storage here

nata1y commented 3 years ago

Main updates: We are in the process of developing a federated ml service for music dao. Currently facing some issues upon trying to integrate it with ipv8 and possibly would like to discuss it during the meeting. We have changed ml model such that we are not using a feature-based approach anymore. However, due to the limitations of a size of a model that we can send over the net, we likely do not want to store all possible song-song combinations in it (with the corresponding similarity value/weight). Thus, the idea is to choose a predetermined set of songs (most popular? just randomly chosen ones for now?). Is it an acceptable approach?

synctext commented 3 years ago

Discussed issues:

nata1y commented 3 years ago

Current progress:

Questions: it seems that a song that has been listened to gets downloaded into some local directory. We did not seem to find a proper location of this directory?

nata1y commented 3 years ago

Here is the link to the repo: https://github.com/JCBrouwer/trustchain-superapp

main updates are on dev branch

synctext commented 3 years ago

Discussion of progress, 1 team member present. Please focus on getting code to compile and run as :1st_place_medal: priority. Then make stuff fancy. No need to repair or improve the superapp. Smile library is huge with ML,NLP, etc.) please ignore and do a low-performance, simple approach from scratch. Superapp might have issue with multiple Kotlin versions mixed.

nata1y commented 3 years ago

Updates 15.03:

However:

Screenshot from 2021-03-13 22-52-31

synctext commented 3 years ago
nata1y commented 3 years ago

Main updates:

We are planning to concentrate on testing everything with 2+ peers in the upcoming weeks. We would also like to add bayesian learning from here to achieve more accurate personal ranking learning.

synctext commented 3 years ago
drew2a commented 3 years ago

@nata1y @JCBrouwer This is an example of working with kotlin-ipv8 outside of android: https://github.com/Tribler/trustchain-superapp/blob/master/musicdao-datafeeder/src/main/java/com/example/musicdao_datafeeder/DataFeeder.kt

./gradlew :musicdao-datafeeder:run --args="/home/user/torrents nopublish"
drew2a commented 3 years ago

A minimum program might look like this:

fun musicCommunity(): OverlayConfiguration<MusicCommunity> {
    val driver: SqlDriver = JdbcSqliteDriver(JdbcSqliteDriver.IN_MEMORY)
    Database.Schema.create(driver)
    return OverlayConfiguration(
        factory = MusicCommunity.Factory(
            settings = TrustChainSettings(),
            database = TrustChainSQLiteStore(Database(driver))
        ),
        walkers = listOf(RandomWalk.Factory())
    )
}

fun discoveryCommunity() = OverlayConfiguration(
    factory = DiscoveryCommunity.Factory(),
    walkers = listOf(
        RandomWalk.Factory(timeout = 3.0, peers = 20),
        RandomChurn.Factory(),
        PeriodicSimilarity.Factory()
    )
)

fun ipv8() = IPv8(
    endpoint = EndpointAggregator(
        udpEndpoint = UdpEndpoint(
            port = 8090,
            ip = InetAddress.getByName("0.0.0.0")
        ), bluetoothEndpoint = null
    ),
    configuration = IPv8Configuration(
        overlays = listOf(
            discoveryCommunity(),
            musicCommunity()
        ), walkerInterval = 1.0
    ),
    myPeer = Peer(JavaCryptoProvider.generateKey())
)

fun main(args: Array<String>) {
    val ipv8 = ipv8()
    ipv8.start()
}
JCBrouwer commented 3 years ago

Updates (Penultimate Edition?)

synctext commented 3 years ago
JCBrouwer commented 3 years ago

PR 1

PR 2

synctext commented 3 years ago

Raw .APK: https://github.com/JCBrouwer/trustchain-superapp/raw/federated-music-recommendation/gossipML/app-debug-gossipML.apk To conclude:

JCBrouwer commented 3 years ago

Federated music recommendation has been merged :tada: