the first AI dAPP - self-evolving AI-DAO

synctext commented 3 years ago

Master course on Blockchain Engineering project 2021

TEAM1: the first AI-dApp Create a dApp with federated machine learning to understand music taste of user and recommend more. Exchange training vectors on the blockchain overlay. Never share with others what Bittorrent music swarms you like. By only sharing training data you can discover new music and still preserve privacy. Prior work and this.

General Description - self-evolving AI-DAO (https://github.com/Tribler/tribler/issues/5944)

Student from Delft have created a blockchain-based alternative to Spotify. Completely decentralised. Uses Bittorrent for streaming, Bitcoin for payments to artists, Trustchain with IPv8 for music discovery and IPv8 for app-to-app connectivity. With multiple teams the aim is to take this code to the next level: self-evolution.

A total of 4 students teams (4-5 students in each team) will work together on a cutting-edge scientific problem: how to create a software system which can be expanded in real-time and increasingly become more 'intelligent'. Build upon the existing open source app by TUDelft on the Android Play store using blockchain technology: the Superapp. You will help transform essential parts of the music industry and replace them with open source software. Current code:

GIF: Browsing and streaming music with Bittorrent

GIF: Sending money to artists using Bitcoin

DAO - organisations in software

To make a "self-evolving" app we use the DAO concept. What is a DAO? Within the coming decades the future of jobs, employment and the nature of the firm will change profoundly. Automation, AI, and robots will replace many of today's jobs. A new type of company is a company without any employees, without any machines or physical infrastructure. A Decentralized Autonomous Organizations, DAO, only exists in software. It goes beyond smart contracts, it is a complete company inside software. DAO development is still in the experimental stage. Background reading. Very optimistic view on DAO, official US review of DAO by Securities and Exchange Commission.

Within this master course you can create your very own autonomous organisation, the AI-DAO. Learn to engineer a decentralised autonomous organisation, use the existing tools, and understand the security risks. The aim is to alter the nature of the firm in the Internet age, see the Nobel prize winning theory. Production cost become essentially cost-free. An organisation which exists purely in cyberspace. The AI-DAO is designed to be the first sustainable DAO. How can we empower leaderless organizations? How can it earn money from manipulating bits?

Scientific challenge: Self-evolving

A key step in an autonomous system is that it can evolve independently. This enables growth and evolution independently of any central organisation, sponsoring government, or tribe of volunteers.

You will collectively solve the problem of paying somebody to make new features in open systems which are fully decentralised. This goes further then paying somebody Bitcoins to create a new version. Decentralised technology is very robust to failures, manipulation, faults, and courtcases. For instance, The Internet itself is almost impossible to shutdown so is the "Tor darknet". With other teams you will address a key drawback of decentralised technology: difficult to update, nearly impossible to evolve, and lacks incentives to develop new features.

dApp ecosystem

"Distributed Applications" are a distributed way of running code. You will help develop an ecosystem of "global code". Code is running atop a blockchain and peer-to-peer (P2P) network that acts as a kind of operating system. This provides security, resilience, privacy, and novel features. This is related to smart contracts, but has no slow single virtual machine (all discussed in the online classes material). Background material, read FBASE trustworthy code execution

PNG: difference between cloud and decentralised Apps

synctext commented 3 years ago

Compile Superapp from sources
understand the machine learning parts: Prior work, federated machine learning @JCBrouwer
What is a distributed App? compile the FBASE code Repo
Extend the MusicDAO with recommendation @ehildebrand
Main ToDo for progress: add a dummy recommended album in MusicDAO

JCBrouwer commented 3 years ago

Hello world

nata1y commented 3 years ago

Hello world

skullyhoofd commented 3 years ago

Hello world, Ricardo here!

MarkovChainmail commented 3 years ago

hello world, this is esmee, over

nata1y commented 3 years ago

Hi, Is it possible to provide us with a type / format of training data that we can get from Superapp? What are the features we can get and in which format? Do we understand correctly that there is no preprocessing done on user data yet? Thanks in advance!

MarkovChainmail commented 3 years ago

Stuff we need to do/find out:

What type of information can we find about the user?
- User listening history
- Implement "like button" yes/no?
Recommend local songs or also songs held by peers?
How to perform ML based on music?
- How to encode music?
- How to get songs out of the database and encode them?
- How to weigh songs in recommender?
Find Music Dataset
What does Johan want w.r.t. FBase.
Start a new folder for our project?

JCBrouwer commented 3 years ago

Tasks 1) getting music info from the app @ehildebrand 2) finding a dataset 3) feature vectors from music and online svm @nata1y 4) gossiping models @JCBrouwer 5) predictions with model in interface @ehildebrand 6) like button in interface @ehildebrand

JCBrouwer commented 3 years ago

PySyft evaluation

https://github.com/JCBrouwer/fedrecsys

Pros:

PySyft is super plug and play, it was pretty simple to get a basic recommender and MNIST classifier up & running
All the training communication stuff is handled for us, we just need to spawn a server on our phone and tell other phones that they should join our training session. This makes it pretty amenable to the gossip training paper Johan linked us
the maintainers are pretty responsive on slack

Cons:

Python based. Even though the Syft worker nodes can run in Kotlin, it requires running a coordinator server (PyGrid) which is only available in python. We could still probably wrap this up and call it from Kotlin using e.g. https://github.com/dickensas/kotlin-gradle-templates/tree/master/embed-python but it's less than ideal. Or maybe implement this part ourselves using ipv8 stuff
It uses PyTorch which is very deep learning focused which we probably don't have the data for
Syft is undergoing a full rewrite at the moment, so we'd be building on version 0.2.x which is no longer supported

nata1y commented 3 years ago

A brief overview of current findings: We are going to create a separate Android library dedicated to federated ML. We are going to use Online Learning interface for training models in place from smile Kotlin library (https://haifengl.github.io/quickstart.html). We are going to implement models from the paper: Pegasos svm and adaline. Not clear yet on concrete features encoding/engineering technique, depends on the type of data we can get from the user, to be discussed during the next meeting..

JCBrouwer commented 3 years ago

Some literature

More efficient gossiping

These papers both propose more efficient gossiping by compressing parts of training intelligently. These can be useful to implement almost regardless of the exact recommendation models we choose to use. The second has a high-quality open-source implementation we can use as reference (+ the authors have a slack where they are reachable).

Decentralized machine learning using compressed push-pull averaging
- From the same authors as the paper the distributed-ai-kernel was based on
- Introduces higher communication-efficient gossiping algorithm (compressed push-pull)
- Finetunes final layers of pre-trained deep networks for better performance despite training via gossip
Powergossip [Code] [Paper]
- Introduces an algorithm that directly compresses the model differences between neighboring workers using low-rank linear compressors applied on model differences.
- Builds on + implements ChocoSGD / ChocoGossip [Summary] [Paper]

User network based

Once we have basic feature-based recommendations and a like button in the interface, we can think of bootstrapping some network-based collaborative filtering. This can be based on an assembly of info (music features, listening history, and likes) to find similar users so we can recommend based on similarity. The first paper and last 2 papers go over how this can be done in a privacy-preserving way.

A Peer-to-Peer Recommender System with Privacy Constraints
- ftp://www.kom.tu-darmstadt.de/papers/PKFS09_567.pdf
- Builds a federated social network and introduces a privacy preserving item-based collaborative filtering algorithm that uses 'user similarity' in the social network for recommendations.
- Builds privacy on top of Distributed Collaborative Filtering for Peer-to-Peer File-Sharing Systems
A peer-to-peer recommender system for self-emerging user communities based on gossip overlays
- Similar to above, tries to create recommendations by first associating similar users into groups
- Not privacy-focused, but perhaps extensible to be
Radiommender: P2P On-line Radiowith a Distributed Recommender System
- Almost identical task
- not privacy-focused
Personalized and Private Peer-to-Peer Machine Learning
P2P-based PVR Recommendation using Friends,Taste Buddies and Superpeers

Centralized server approaches

Many approaches use a centralized parameter server to keep global updates. It seems plausible that a true central server could be replaced by a parameter server running locally on each device. These parameter servers could then share their weight updates via gossiping.

synctext commented 3 years ago

Possible gameplan for this entire project:

machine learning; please use collaborative filtering
- stuff without machine learning that worked 16 years ago: https://grouplens.org/beyond2005/full/pouwelse.pdf
- example code explained
- That repo has initial code
- Focus on this for simplicity
playlist history (played magnet link since install; create simple database table)
- Lots of magnet links: http://bt.etree.org/index.php?sort=seeders
- the Million Playlists Songs Dataset page (just an example what is out there, just focus on getting it to work for 5 users)
- @drew2a we can seed this content for you: 917 GiB and 343 days of Creative Commons-licensed audio from 106,574 tracks from 16,341 artists and 14,854 albums, arranged in a hierarchical taxonomy of 161 genres and expand it with a crawl of 55,000 full audio tracks of Jamendo
IPv8 gossip of playlist history for debugging
User interface (your playlist history, others playlist history (debug-only), recommender)
MusicDAO now "production software" with real users in the wild, https://torrentfreak.com/university-runs-massive-bittorrent-seedbox-to-showcase-music-streaming-app-210220/

For next sprint: make progress on the above issues and report progress for next meeting.

EDIT: plus this plug-in system for "dApp" approach work fully, but usability is -100%, https://github.com/Tribler/trustchain-superapp/blob/master/freedomOfComputing/README.md Bonus extension: integrate a compiler inside the superapp, to enable source code distribution and compile locally: https://play.google.com/store/apps/details?id=ru.iiec.jvdroid&hl=en_GB&gl=US (less security hazards, source code inspection) EDIT2 (for future reference): Java compiler on Android: https://github.com/t-arn/java-ide-droid, heroic efforts for memory overflow, and https://play.google.com/store/apps/details?id=com.krazeapps.kotlinprogrammingcompiler

drew2a commented 3 years ago

Two suggestions from @drew2a side

Use a separated branch

As we are "production software" now, I suggest performing all the development of the Superapp in a separate branch to prevent interfering with a published code from the master branch.

To do this, we need to create a branch in https://github.com/Tribler/trustchain-superapp (e.g. feature/music-dao-recommendations). Then, all of our commits will go to this branch. At the end of the project, this branch will be merged to master.

If you are not familiar with git, take a look at https://www.gitkraken.com/

Split a network until the project will be done

It will be tough to debug and test a changed application, within the wild network. There are multiple scenarios of how we can make life easier for ourselves.

I propose the following approach:

Deploy a dedicated bootstrap server (or a few). I can help with that.
Change (or override) a list of bootstrap servers (see an example here)

PS

@drew2a we can seed this content for you: 917 GiB and 343 days of Creative Commons-licensed audio from 106,574 tracks from 16,341 artists and 14,854 albums, arranged in a hierarchical taxonomy of 161 genres and expand it with a crawl of 55,000 full audio tracks of Jamendo

Wow :)

JCBrouwer commented 3 years ago

Use a separated branch

As we are "production software" now, I suggest performing all the development of the Superapp in a separate branch to prevent interfering with a published code from the master branch.

Yes we're developing in branches on this fork. We can merge branches from there directly to the upstream master via pull request.

I propose the following approach:

1. Deploy a dedicated bootstrap server (or a few). I can help with that.

2. Change (or override) a list of bootstrap servers (see an example [here](https://github.com/Tribler/kotlin-ipv8/commit/32a7286e20c254bd747be47c063fbcbdfe93d532))

Ok sweet, that sounds good.

@drew2a we can seed this content for you: 917 GiB and 343 days of Creative Commons-licensed audio from 106,574 tracks from 16,341 artists and 14,854 albums, arranged in a hierarchical taxonomy of 161 genres and expand it with a crawl of 55,000 full audio tracks of Jamendo

Who sent you this? That definitely sounds like it can help our issues with amount of data!

drew2a commented 3 years ago

Who sent you this? That definitely sounds like it can help our issues with amount of data!

Johan sent me this

JCBrouwer commented 3 years ago

Who sent you this? That definitely sounds like it can help our issues with amount of data!

Johan sent me this

Ahh I read over it, nice!

MarkovChainmail commented 3 years ago

Andrei told me you can find the data storage here

nata1y commented 3 years ago

Main updates: We are in the process of developing a federated ml service for music dao. Currently facing some issues upon trying to integrate it with ipv8 and possibly would like to discuss it during the meeting. We have changed ml model such that we are not using a feature-based approach anymore. However, due to the limitations of a size of a model that we can send over the net, we likely do not want to store all possible song-song combinations in it (with the corresponding similarity value/weight). Thus, the idea is to choose a predetermined set of songs (most popular? just randomly chosen ones for now?). Is it an acceptable approach?

synctext commented 3 years ago

Discussed issues:

clean ORM for Android, remove loose SQL in Debug, MusicDAO and Peerchat
Skeleton Machine learning model in superapp and ? initial exchange of serialised vectors in IPv8
TUDelft Seedbox in 2-3 weeks with more songs to recommend coming (@drew2a)

nata1y commented 3 years ago

Current progress:

Refactoring of ml models to operate with a sparse array instead of normal array
Tried to migrate trustchain to kotlin 1.4.30, still experiencing some bugs.
Start with db integration

Questions: it seems that a song that has been listened to gets downloaded into some local directory. We did not seem to find a proper location of this directory?

nata1y commented 3 years ago

Here is the link to the repo: https://github.com/JCBrouwer/trustchain-superapp

main updates are on dev branch

synctext commented 3 years ago

Discussion of progress, 1 team member present. Please focus on getting code to compile and run as :1st_place_medal: priority. Then make stuff fancy. No need to repair or improve the superapp. Smile library is huge with ML,NLP, etc.) please ignore and do a low-performance, simple approach from scratch. Superapp might have issue with multiple Kotlin versions mixed.

nata1y commented 3 years ago

Updates 15.03:

fully migrated superapp to kotlin 1.4.30
able to create/restore local recommendation model and display music suggestions based on it
added recommendation fragment to music DAO (see picture below)

However:

we are still not sure that gossiping works, we need to test it properly
we are still experiencing troubles with retrieving local mp3 data and, therefore, current models give almost random recommendations
trustchain transactions have info about artist, publisher, year, and song. Are we able to retrieve any other type of data? Would it be ok to download "candidate recommendations" somewhere locally so that we are able to access mp3 data as genre or bpm to achieve better predictions?

Screenshot from 2021-03-13 22-52-31

synctext commented 3 years ago

Making great progress! Real recommendation screenshot.
Running code is vital, so if its difficult to guarantee privacy; then leave that as future work (item metadata issue)
This week sprint: making minimal viable product of recommender that is superior to random
integration of everything is not easy, especially gossip.
Focus on feature, something like, creation year. For instance, 2 profiles of old songs; 2 profiles of fresh songs. Recommend based on 1 song that is playing now (either old or fresh).
Need for binary transfer?

nata1y commented 3 years ago

Main updates:

we are able to gossip models between our peers when running app in android studio simultaneously
we came across this paper that describes gossiping algorithm for collaborative filtering approach and implemented Matrix factorization model
in the case of feature-based model, we are able to retrieve song data, however, ioften it is very limited or completely blank. moreover, it does not seem to include (many) useful features. therefore, the only information we use that is already available is 'genre' and 'year'. on top of that, we have integrated essentia library into superapp and now we are able to extract around 225 features for a song (some might be empty).
we added model tests
we added 'feature gossiping' for songs user do not have locally but we haven't tested it properly yet.

We are planning to concentrate on testing everything with 2+ peers in the upcoming weeks. We would also like to add bayesian learning from here to achieve more accurate personal ranking learning.

synctext commented 3 years ago

Again, solid progress.
More testing needed; possibly x86 and non-interactive file-based input?
Please focus on wrapping up and Superapp polish; "feature freeze"
For final grading: Your Pull Request, Readme addition, and functional .APK

drew2a commented 3 years ago

@nata1y @JCBrouwer This is an example of working with kotlin-ipv8 outside of android: https://github.com/Tribler/trustchain-superapp/blob/master/musicdao-datafeeder/src/main/java/com/example/musicdao_datafeeder/DataFeeder.kt

./gradlew :musicdao-datafeeder:run --args="/home/user/torrents nopublish"

drew2a commented 3 years ago

A minimum program might look like this:

fun musicCommunity(): OverlayConfiguration<MusicCommunity> {
    val driver: SqlDriver = JdbcSqliteDriver(JdbcSqliteDriver.IN_MEMORY)
    Database.Schema.create(driver)
    return OverlayConfiguration(
        factory = MusicCommunity.Factory(
            settings = TrustChainSettings(),
            database = TrustChainSQLiteStore(Database(driver))
        ),
        walkers = listOf(RandomWalk.Factory())
    )
}

fun discoveryCommunity() = OverlayConfiguration(
    factory = DiscoveryCommunity.Factory(),
    walkers = listOf(
        RandomWalk.Factory(timeout = 3.0, peers = 20),
        RandomChurn.Factory(),
        PeriodicSimilarity.Factory()
    )
)

fun ipv8() = IPv8(
    endpoint = EndpointAggregator(
        udpEndpoint = UdpEndpoint(
            port = 8090,
            ip = InetAddress.getByName("0.0.0.0")
        ), bluetoothEndpoint = null
    ),
    configuration = IPv8Configuration(
        overlays = listOf(
            discoveryCommunity(),
            musicCommunity()
        ), walkerInterval = 1.0
    ),
    myPeer = Peer(JavaCryptoProvider.generateKey())
)

fun main(args: Array<String>) {
    val ipv8 = ipv8()
    ipv8.start()
}

JCBrouwer commented 3 years ago

Updates (Penultimate Edition?)

Added a little randomness to recommendations
Added a lot of comments and cleaned up code (imports, format, naming, logging)
Started on README/documentation
More testing of gossiping, works well for both features and models provided peers can get audio files correctly
Thorough testing of matrix factorization model
- Validation tests with 10 users in 4 preference groups
  - Each group has 2-3 users that are missing one song from their song group
- Round robin gossiping converges to the same song features across the network
- Predictions are correct, suggesting the last song of the group in more than 90% of cases
~Still having issues with the feature based model~
- ~Essentia extractor very rarely throws SIGABRTs or segfaults~
  - ~Probably related to corrupted/half-downloaded mp3 files (not easy to replicate on a new install)~
  - ~Cannot catch these errors gracefully because they happen in native code (causes crashes)~
- ~Possible solutions~
  1. More validation of MP3 files before feeding to Essentia
  2. ~Try to find a different music extractor which is more stable (e.g. JAudio)~
  3. ~Try embedding python essentia API instead (perhaps this happens to catch these errors more robustly)~
  4. ~Remove essentia and try to predict only based on block metadata (performance will probably be terrible)~
  5. ~Remove feature based model altogether :(~
- Last minute fix seems to have solved this in the case that was consistently broken (related to multiple background threads clashing)
- Just need to add some final validation tests to the feature-based model

synctext commented 3 years ago

https://github.com/JCBrouwer/trustchain-superapp/blob/master/app-debug-gossipML.apk?raw=true
Sorry, please reproduce and fix (Android 10):
Great to see that successful completion is in sight
Ambition task in this master course and looking very promising
Focus is on getting everything ready. Then documentation and last-week polish.
1 week left: team is confident the'll make the self-imposed deadline
For Perfect Polish: add an animated .GIF where you see everything working. Suitable for Twitter. Plus what impressive math did you get operational? Like:

JCBrouwer commented 3 years ago

PR 1

[x] validate prediction performance of feature-based model
[x] test apk on real phones
[x] fix debug application in superapp
[x] move essentia to submodule
[x] squash commits to make main PR easier

PR 2

[x] add comments to matrix factorization code
[x] more high-level description in main README
[x] describe architecture of our code
[x] create diagram of federated gossip learning for recommendations
[x] create GIF of recommendations

synctext commented 3 years ago

Raw .APK: https://github.com/JCBrouwer/trustchain-superapp/raw/federated-music-recommendation/gossipML/app-debug-gossipML.apk To conclude:

amazing work by only 2 people
big commit with lots of (json) code
very advanced federated learning
Readable readme
student team got state-of-the-art science operational and first experimental results
the MusicDAO as a whole barely works. Music discovery slow, not repaired and that greatly influences the demo experience
Worked hard and focussed!
feedback: the blockchain engineering course could provide better intro to Superapp at start. Difficult to get going in this course in first weeks.
agreement: wrap up pull request, then course is completed.

JCBrouwer commented 3 years ago

Federated music recommendation has been merged :tada:

Tribler / tribler