ddangelov / Top2Vec

Top2Vec learns jointly embedded topic, document and word vectors.
BSD 3-Clause "New" or "Revised" License
2.95k stars 374 forks source link

Getting invalid sentenceBERT model & batch_size #249

Closed kunal-bhadra closed 2 years ago

kunal-bhadra commented 2 years ago

I am trying to use Top2Vec with the pretrained models. universal-sentence-encoder did not work because for some reason, the embedding_batch_size was showing invalid for it so I then had to the sbert model. This was a corpus of just English words so when I tried to load in all-MiniLM-L6-v2, it showed all-MiniLM-L6-v2 is an invalid embedding model. Can anybody tell me why the above issues are happening?

Edit: On further investigation in the Top2Vec.py file, I found that the only acceptable models are

This is in contrast to the 8 models listed in the official API doc where all-MiniLM-L6-v2 is listed as well. Can we know the reason behind this and the models that are actually supported?

ddangelov commented 2 years ago

This is because the latest code isn't on pypi yet. I will be creating a new version 1.0.27 shorty.