Open mitchellnw opened 2 years ago
Yes sounds great
On Thu, Oct 6, 2022, 21:02 Mitchell Wortsman @.***> wrote:
could be good to have some fairness related datasets, e.g., from https://arxiv.org/abs/2108.02818. curious how LAION CLIP compares to OAI CLIP.
— Reply to this email directly, view it on GitHub https://github.com/LAION-AI/CLIP_benchmark/issues/15, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAR437T6NDHBI7VF4ZT3CT3WB4O5BANCNFSM6AAAAAAQ65JNMI . You are receiving this because you are subscribed to this thread.Message ID: @.***>
FairFace: https://github.com/joojs/fairface, will add fairface based on the excellent notebook https://colab.research.google.com/drive/13f8B2698YWcbCmApe8IdlAm3oAoGZ4D8?usp=sharing#scrollTo=5b1VxTcfYhSz from @Rijgersberg
Ah yes it would be very interesting to see how other CLIP-like models behave on this dataset. Especially since I was unable to replicate the results from the CLIP paper and never got a response from OpenAI on that.
@Rijgersberg I could reproduce your numbers with ViT-L-14-336, so yes not sure why there is a big difference with results reported in CLIP paper, I tried to play a bit with the prompts, but it does not change much, especially for non-human categories which stay very low.
On the other hand, for results on gender or race prediction only (Table 3 from CLIP paper), accuracy is not exactly the same but close. On race prediction I get 59.2% (CLIP reports 58.3%), on gender prediction I get 96.2% (CLIP reports 95.9%).
The setup they had feels anyway weird to me, I am not sure why crime-related classes are added to the existing (gender/race) classes and we ask the classifier to choose between gender/race and crime-relate classes, I think it should be more like multi-label classification. Maybe retrieving images from FairFace with crime-related prompts with a certain threshold distance, then just plot the distribution of race/gender ?
On the other hand, for results on gender or race prediction only (Table 3 from CLIP paper), accuracy is not exactly the same but close. On race prediction I get 59.2% (CLIP reports 58.3%), on gender prediction I get 96.2% (CLIP reports 95.9%).
Hello @mehdidc, how are you?
I'm trying to replicate these results using the notebook but to no avail, could you share some details on how you did it?
Here is a snippet of how I was trying to achieve this:
fface_df = pd.read_csv("./data/fairface/fface_val.csv")
fface_df.drop(columns=['service_test'], inplace=True)
fface_df['race_labels'] = fairface_labels
fface_df['race_preds'] = predictions
white_df = fface_df[fface_df['race'] == 'White']
non_white_df = fface_df[fface_df['race'] != 'White']
print("** FairFace dataset validation split 0.25 **")
print("Race accuracy of 'White' race images: ")
print(round(accuracy_score(white_df['race_labels'], white_df['race_preds']), 4))
print("Race accuracy of all other races grouped as 'Non-White':")
print(round(accuracy_score(non_white_df['race_labels'], non_white_df['race_preds']), 4))
could be good to have some fairness related datasets, e.g., from https://arxiv.org/abs/2108.02818. curious how LAION CLIP compares to OAI CLIP.