the core idea - Githubissues

facebookresearch / CiT

Code for the paper titled "CiT Curation in Training for Effective Vision-Language Data".

Other

78 stars 1 forks source link

Thanks for your interests.

The curation differences of this paper from CLIP is: (1) CLIP's WIT 400M is built from a much larger metadata (queries), including WordNet (so it covers IN labels), CiT so far uses smaller ones (see Table 7, not just IN-1K but also IN-21K, 26 tasks combined etc.) for efficiency. (2) WIT400M is built offline, CiT uses online model-based curation on semantic-level (Table 8). (Note CiT is not substring matching, or hard class assignment, metadata is NOT used for training). Thus Table 7 shows even IN-1K can generalize to other tasks.

Hope this answered your question.

facebookresearch / CiT

the core idea #3