Closed teohsinyee closed 2 years ago
Tho I can't find the answer from FastText official site. I found the answer from GloVe site. Here mentioned:
Common Crawl (42B tokens, 1.9M vocab, uncased, 300d vectors, 1.75 GB download): glove.42B.300d.zip Common Crawl (840B tokens, 2.2M vocab, cased, 300d vectors, 2.03 GB download): glove.840B.300d.zip
So the answer is - the dataset is cased. This means it combined Uppercase & lowercase. Also, I have uploaded the csv file to kaggle: https://www.kaggle.com/datasets/teohsinyee/word-of-common-crawl-cased-300d-vectors
Source: https://www.kaggle.com/datasets/yekenot/fasttext-crawl-300d-2m