Closed dsalfran closed 6 years ago
@dsalfran Hi, thanks for the question. The query_predict takes each input, considers each sentences in basedocs and make a prediction. In other words, the number of sentences in basedocs would affect the prediction speed. You can try to lower that if you want faster prediction (In our paper we usually use 10k sentences as basedocs). For 3, yes the .tsv format of the model is meant to be easy to use in other software where it takes standard tsv format models.
Currently, after training a model with StarSpace we obtain two files, one with the model and one .tsv file with a dictionary of embedding vectors. My model was trained with 300 dimensions, the vocabulary in the dictionary in the dictionary is about 300k words.
When calling the
query_predict
binary I must provide abasedocs
file which contains approximately 300k sentences, one per line. My issue is that obtaining predictions take too long.query_predict
does?fasttext
orgensim
to obtain faster predictions?