adapt-python / adapt

Awesome Domain Adaptation Python Toolbox
https://adapt-python.github.io/adapt/
BSD 2-Clause "Simplified" License
300 stars 43 forks source link

TrAdaBoost Weight Updating Issue #112

Open simon-minami opened 1 year ago

simon-minami commented 1 year ago

I may be mistaken, but I believe there may be an error in the TrAdaBoost weight updating formula on line 386.

beta_t = estimator_error / (2. - estimator_error)

If I'm not mistaken, the original paper by Dai et al. specifies:

beta_t = estimator_error / (1. - estimator_error)

I was getting some unexpected results when running TrAdaBoost, and making this change seemed to fix them. However, this may be purely coincidental and I may be wrong. Thanks in advance!

antoinedemathelin commented 11 months ago

Hi @simon-minami, Thank you for raising this important point!

In fact, we take some liberties with the original paper here, to extend TrAdaBoost to multiclassification.

In Dai et al., the algorithm is only suited for binary classification. In which case, if the classifier error is above 0.5, the opposite classifier is taken, resulting in a classification error always in [0, 0.5]. For multi-classification, this cannot be done, and the classification error is in [0, 1.]. To compute beta in this case we choose the formula beta_t = estimator_error / (2. - estimator_error), with the underlying idea that beta_t is equal to zero when the error is minimal and equal to one when the error is maximal, as for the formula beta_t = estimator_error / (1. - estimator_error) when estimator_error is in [0, 0.5].

Notice that the two formula are not strictly equivalent in binary classification. But we did some tests on toy dataset and observe that both implementation choices lead to similar results.