epfml / disco

DISCO is a code-free and installation-free browser platform that allows any non-technical user to collaboratively train machine learning models without sharing any private data.
https://discolab.ai
Apache License 2.0
142 stars 25 forks source link

incentives / rewards for data utility #11

Open martinjaggi opened 4 years ago

martinjaggi commented 4 years ago

this is a big one we'll look into a bit later

DwaraknathT commented 4 years ago

Hi Martin, when I read the label what I think of is some form of incentive mechanism or ranking mechanism to rank the individual components in the network according to their participation in the entire training ? A recent work I read tackles something similar to this. BitTensor: An Intermodel Intelligence Measure; link- https://arxiv.org/abs/2003.03917 However, the goal here is to have a peer2peer like model where individual players are working on their own tasks (their own datasets and loss functions) and how we can rank individual members based on their contribution to the entire network. The paper also explores an asynchronous and distributed way of training individual models whilst making use of the knowledge gained by their peers. Hence avoiding having to relearn everything from scratch. A very detailed analysis on how to make a fair "market system" like scenario is also given. Hopefully this might be of interest, use.

martinjaggi commented 4 years ago

thanks! this is interesting. however if i understand correctly then in this paper they assume that all participants are truthful/honest about their data and labels. i haven't read all details though. this might be a starting point, but in reality we might need robust algorithms which also allow a fraction of the participants to provide bad or fake data (e.g. based on Byzantine robust SGD or similar).