alinlab / LfF

Learning from Failure: Training Debiased Classifier from Biased Classifier (NeurIPS 2020)
89 stars 9 forks source link

Incorrect accuracy computation #5

Open ktkachuk opened 2 years ago

ktkachuk commented 2 years ago

In train.py the valid_accs_b and valid_accs_d are computated as a mean of the mean attribute-wise accuracies. This will result in incorrect results if the class and bias labels are not equally distributed.

mvandenhi commented 2 years ago

Hey ktkachuk, as I understand the code, they compute the mean of the attribute-wise accuracies(without the mean), meaning they take the size of the respective groups into account. The mean attribute-wise accuracy is computed afterwards starting from line 278 - 300. Best Moritz

ktkachuk commented 2 years ago

Hey Moritz, thank you for your reply. The problem is that the attribute-wise accuracies do not contain the number of samples in the respective groups. In general computing the mean of accuracies does not yield the correct accuracy of a model. E.g. if we have two classes with accuracies of 90% and 10% then the mean would be 50%. We have no information about the number of samples here. If we would have 100 samples in the 1st class and 10 samples in the 2nd class, then the real accuracy would be 83% (91/110).

mvandenhi commented 2 years ago

I checked again and yeah I think you're right. I guess this computation does only make sense under the assumption that each combination of bias & target attribute is equally likely to appear and estimating the accuracy of each combination separately before combining them using the assumption.

But I guess that's also what they are interested in, when calculating "unbiased" accuracy. As they state in the paper "We construct the unbiased evaluation set in a way that the target and bias attributes are uncorrelated"

zhao1402072392 commented 1 year ago

Hi, Could I know how to calculate the "unbiased" accuracy in the paper? I can't got the results in the paper.