google-research / big_vision

Official codebase used to develop Vision Transformer, SigLIP, MLP-Mixer, LiT and more.
Apache License 2.0
2.16k stars 147 forks source link

Question: Updating mask in classification evaluator #42

Closed MasterSkepticista closed 10 months ago

MasterSkepticista commented 11 months ago

A padding mask with _mask=0 is built here for evaluation datasets, which also implicitly sets label to be a vector of all zeros for the fake example. https://github.com/google-research/big_vision/blob/184d1201eb34abe7da84fc69f84fd89a06ad43c4/big_vision/input_pipeline.py#L149

Why is there a need to update the mask here?

https://github.com/google-research/big_vision/blob/184d1201eb34abe7da84fc69f84fd89a06ad43c4/big_vision/evaluators/classification.py#L39

lucasb-eyer commented 11 months ago

This is for some rare classification datasets where some examples do not have a single label. An example is our ImageNet-ReaL dataset, which has ~3000 validation images with no label. According to the official metric, these should be ignored, rather than always scored as wrong, which is what'd happen if the mask wasn't updated.