dchen236 / FairFace

465 stars 92 forks source link

Accuracy difference between train and val datasets and paper #28

Open zoltanfarkasgis opened 1 year ago

zoltanfarkasgis commented 1 year ago

Out of curiosity, we took the 0.25 datasets (train and val) and run those through the ResNet34 model and weights trained by the authors. This 0.25 dataset is the one that the face align part in predict.py creates (so we assume it is equivalent to the one used and referred in the paper for 7 race classes).

Interestingly, the accuracy (match between labels in the original csv and predicted classes by the published model) differs between the train and val(test) datasets and both are lower than presented in the paper: Train:

BTW, as it was previously discovered by fellow commenters, the filter service_test == True defines a subset where the labels are balanced in terms of race and gender. Therefore, we calculated metrics both for the full set and this subset.

We would have expected higher and consistent percentages.

Please feel free to correct any inaccuracy or misinterpretation above or provide an explanation.