LabeliaLabs / distributed-learning-contributivity

Simulate collaborative ML scenarios, experiment multi-partner learning approaches and measure respective contributions of different datasets to model performance.
https://www.labelia.org
Apache License 2.0
56 stars 12 forks source link

make the library label agnostic #252

Open arthurPignet opened 3 years ago

arthurPignet commented 3 years ago

Currently most of the functions work with the assumptions that the labels are one-hot-encoded vectors.

Besides the fact that it is not responsive, sometime we need to play with label index. (first label is indexed 1, second label 2, and so on) A solution can be to add (automatically) at the dataset generation a dict of label, where the keys would be integer and the values would be the vectors for instance.

An instance with MNIST :

dataset.dic_label = { 0: [1,0,0,0,0,0,0,0,0,0], 1 : [0,1,0,0,0,0,0,0,0,0], 2 : [0,0,1,0,0,0,0,0,0,0], 3 : [0,0,0,1,0,0,0,0,0,0], 4 : [0,0,0,0,1,0,0,0,0,0], 5 : [0,0,0,0,0,1,0,0,0,0], 6 : [0,0,0,0,0,0,1,0,0,0], 7 : [0,0,0,0,0,0,0,1,0,0], 8 : [0,0,0,0,0,0,0,0,1,0], 9 : [0,0,0,0,0,0,0,0,0,1]}

bowni commented 3 years ago

Is this still relevant @arthurPignet ?

arthurPignet commented 3 years ago

Yes it is. The split of the data between partners is label agnostic, but it is not he case of the shuffling/corruption Basically, the only type of labels accepted is one-hot.