djghosh13 commented 8 months ago

98 added better support for custom zeroshot prompt templates. However, the exact implementation results in behavior which conflicts with webdataset datasets (e.g. "wds/cifar10") stored on Huggingface or locally. Specifically:

For zeroshot classification, templates are loaded from en_zeroshot_classification_templates.json
If the dataset name is not recognized, the templates are set to IN1k templates
build_wds_dataset loads classnames and templates from the source (local filesystem or Huggingface repo) (line 731)
ds.templates = templates overwrites the templates on line 45

This affects all wds datasets, though it probably hasn't changed results in the majority of cases, where the dataset name and templates match the defaults in en_zeroshot_classification_templates.json.

A fix should probably either:

Never overwrite ds.templates if it already exists; or
Overwrite ds.templates only if the custom_template_file parameter is set (i.e. custom template takes precedence over webdataset template)

mehdidc commented 8 months ago

Thanks a lot @djghosh13 ! I haven't noticed this, I think it would make sense to overwrite only if custom option is provided (second option). Will do a PR mentioning the issue in datacomp as well. So basically order of precedence would be:

1 - templates from --custom_template_file (if provided). 2 - the dataset existing templates (if it has, like in WDS case, or Babel ImageNet) 3 - templates from <LANG>_zeroshot_classification_templates.json if dataset name is in there, otherwise imagenet1k templates.

The other thing I am thinking of, is to (optionally) write the classnames/templates that ended up beging used in the JSON dump to have full transparency.

mehdidc commented 7 months ago

Fixed.

LAION-AI / CLIP_benchmark

Overwritten zeroshot templates for webdataset eval #109

98 added better support for custom zeroshot prompt templates. However, the exact implementation results in behavior which conflicts with webdataset datasets (e.g. "wds/cifar10") stored on Huggingface or locally. Specifically: