commonsense / conceptnet-numberbatch

Other
1.29k stars 143 forks source link

Sorting by occurrence count #50

Closed pzelasko closed 6 years ago

pzelasko commented 6 years ago

Hi guys! Do you think that you could provide the conceptnet-numberbatch embeddings sorted by some kind of word frequency, similarly as GloVe and FastText does? In my research I'm limiting the vocabulary to most frequent K words in order not to eat all the GPU memory with embedding lookup when using pretrained embeddings in my models, and the sort order used by the other embeddings makes this much easier.

jlowryduda commented 6 years ago

Hi! The problem with sorting Numberbatch is that it includes phrases and it's difficult to find information on their frequencies. However, if you're just interested in words, you could sort it yourself using wordfreq library.

pzelasko commented 6 years ago

That's a fair point. Thanks for your suggestion, I'll try it out.