Trustworthy-ML-Lab / Label-free-CBM

A new framework to transform any neural networks into an interpretable concept-bottleneck-model (CBM) without needing labeled concept data
73 stars 13 forks source link

Only 998 classes in ImageNet json files #1

Closed Susmit-A closed 1 year ago

Susmit-A commented 1 year ago

Hi,

The ImageNet json files have only 998 classes listed as keys. Could you please provide the completed dictionaries? For reference, "missile" and "sunglasses" classes are repeated twice.

Thanks

tuomaso commented 1 year ago

What do you mean by json files? If you are talking about data/imagenet_classes.txt, this is the complete list of 1000 classes, those two labels just essentially have two separate classes for each of them cause the dataset was not carefully constructed or something. For reference we are using the labels from here: https://github.com/openai/CLIP/blob/main/notebooks/Prompt_Engineering_for_ImageNet.ipynb, but if you look at other references to ground truth class names the same issue persists, see https://deeplearning.cms.waikato.ac.nz/user-guide/class-maps/IMAGENET/ (they're not identical here but still synonyms).

Susmit-A commented 1 year ago

I meant the imagenet json files in data/concept_sets/gpt3_init/. I've generated the concepts using my API key for the missing two classes, and filtered them using the remainder of your code. After fixing the class list, the generated concepts are slightly different but close enough for most use cases.

tuomaso commented 1 year ago

Okay, thanks for the clarification. It shouldn't matter for that part of the code as the final concepts are generated for the model as a whole and not saved in connection to some class, the classes are just used as a helpful way of getting there. Predictions for each class can use concepts derived from a different class anyways.

Adasunnylily commented 1 year ago

I meant the imagenet json files in data/concept_sets/gpt3_init/. I've generated the concepts using my API key for the missing two classes, and filtered them using the remainder of your code. After fixing the class list, the generated concepts are slightly different but close enough for most use cases.

hi, could you please tell me what are the two class left?