LabeliaLabs / distributed-learning-contributivity

Simulate collaborative ML scenarios, experiment multi-partner learning approaches and measure respective contributions of different datasets to model performance.

Apache License 2.0

56 stars 12 forks source link

New contributivity measueament based on statistical distances between 2 distributions:

The partner-specific probability distribution of the label, wrt to the input; (estimated via maximun likelihood wrt to the partner's data)
The latent joint probability distribution of the label, wrt to the input.(estimated via maximun likelihood wrt to the joint dataset)

This difference of distributions is interpreted as a noise, which allow us to use a multiheaded adaptation of the smodel method to the multipartner case to estimate and quantify this pseudo-noise.

These contributivity metrics only need inferences to be computed, on the trained model (trained via FedSmodel)

The computational additional cost is thus neglectable The method doesn't need a 'perfect' and global test dataset.

For now 3 distances are implemented:

KullBack- Leiber divergence
Hellinger metric
Bhattacharyya distance

These metrics are tested on the reference scenarios, see the colab notebook : https://colab.research.google.com/drive/1DN1lLdd1b1ZmttmEiQKpx8xW5guEf_f_?usp=sharing

TODO

[ ] Add doc
[x] Investigate over s-model bug when using Advanced or Flexible splitter
[x] Handle dict contributivity score for result dataframe
[ ] Investigate std computations

Codecov Report

Merging #346 (078cbca) into master (ecc3ea8) will decrease coverage by 0.19%. The diff coverage is 80.37%.

@@            Coverage Diff             @@
##           master     #346      +/-   ##
==========================================
- Coverage   80.68%   80.49%   -0.20%     
==========================================
  Files          15       15              
  Lines        3045     3128      +83     
==========================================
+ Hits         2457     2518      +61     
- Misses        588      610      +22

Impacted Files	Coverage Δ
mplc/multi_partner_learning/__init__.py	`100.00% <ø> (ø)`
mplc/multi_partner_learning/basic_mpl.py	`84.98% <ø> (-0.29%)`	:arrow_down:
mplc/multi_partner_learning/fast_mpl.py	`61.09% <55.31%> (-0.81%)`	:arrow_down:
mplc/contributivity.py	`77.23% <100.00%> (+0.67%)`	:arrow_up:
mplc/scenario.py	`83.27% <100.00%> (+0.77%)`	:arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update ecc3ea8...078cbca. Read the comment docs.

LabeliaLabs / distributed-learning-contributivity

Contrib dist stat #346

Codecov Report