LAION-AI / CLIP_benchmark

CLIP-like model evaluation
MIT License
535 stars 68 forks source link

Overwritten zeroshot templates for webdataset eval #109

Closed djghosh13 closed 7 months ago

djghosh13 commented 8 months ago

98 added better support for custom zeroshot prompt templates. However, the exact implementation results in behavior which conflicts with webdataset datasets (e.g. "wds/cifar10") stored on Huggingface or locally. Specifically:

  1. For zeroshot classification, templates are loaded from en_zeroshot_classification_templates.json
  2. If the dataset name is not recognized, the templates are set to IN1k templates
  3. build_wds_dataset loads classnames and templates from the source (local filesystem or Huggingface repo) (line 731)
  4. ds.templates = templates overwrites the templates on line 45

This affects all wds datasets, though it probably hasn't changed results in the majority of cases, where the dataset name and templates match the defaults in en_zeroshot_classification_templates.json.

A fix should probably either:

mehdidc commented 8 months ago

Thanks a lot @djghosh13 ! I haven't noticed this, I think it would make sense to overwrite only if custom option is provided (second option). Will do a PR mentioning the issue in datacomp as well. So basically order of precedence would be:

1 - templates from --custom_template_file (if provided). 2 - the dataset existing templates (if it has, like in WDS case, or Babel ImageNet) 3 - templates from <LANG>_zeroshot_classification_templates.json if dataset name is in there, otherwise imagenet1k templates.

The other thing I am thinking of, is to (optionally) write the classnames/templates that ended up beging used in the JSON dump to have full transparency.

mehdidc commented 7 months ago

Fixed.