facebookresearch / StarSpace

Learning embeddings for classification, retrieval and ranking.
MIT License
3.94k stars 531 forks source link

Slow query_predict predictions #172

Closed dsalfran closed 6 years ago

dsalfran commented 6 years ago

Currently, after training a model with StarSpace we obtain two files, one with the model and one .tsv file with a dictionary of embedding vectors. My model was trained with 300 dimensions, the vocabulary in the dictionary in the dictionary is about 300k words.

When calling the query_predict binary I must provide a basedocs file which contains approximately 300k sentences, one per line. My issue is that obtaining predictions take too long.

  1. What exactly the binary query_predict does?
  2. Which factors influence the speed of the predictions?
  3. Could it be possible to just use the dictionary of vectors and use other software library like fasttext or gensim to obtain faster predictions?
ledw commented 6 years ago

@dsalfran Hi, thanks for the question. The query_predict takes each input, considers each sentences in basedocs and make a prediction. In other words, the number of sentences in basedocs would affect the prediction speed. You can try to lower that if you want faster prediction (In our paper we usually use 10k sentences as basedocs). For 3, yes the .tsv format of the model is meant to be easy to use in other software where it takes standard tsv format models.