facebookresearch / MetaCLIP

ICLR2024 Spotlight: curation/training code, metadata, distribution and pre-trained models for MetaCLIP; CVPR 2024: MoDE: CLIP Data Experts via Clustering
Other
1.17k stars 49 forks source link

How to implement English Language IDentification (LID)? #48

Closed hbchen121 closed 5 months ago

hbchen121 commented 5 months ago

How do you implement the function o "English Language IDentification (LID)" in the paper?

Thanks~ Looking forward to your reply

howardhsu commented 5 months ago

In the paper we use a production LID model. In this open-source version you can try FastText: https://github.com/facebookresearch/MetaCLIP/blob/43ba9bc254e6eafe03f9dd7fbc79319484b070a6/metaclip/cc_matching.py#L24