epfml / sent2vec

General purpose unsupervised sentence representations
Other
1.19k stars 256 forks source link

embed_sentence returns float64 #36

Closed Fethbita closed 6 years ago

Fethbita commented 6 years ago

The Python module returns float64 for embed_sentence function and float32 for embed_sentences. I don't think this should be expected behavior as it makes certain operations harder. this v = sent2vec_model.embed_sentence(text).reshape(1, -1).astype(np.float32) instead of this v = sent2vec_model.embed_sentence(text).reshape(1, -1) I think it would be better if the two functions (embed_sentence and embed_sentences) were consistent.

mpagli commented 6 years ago

I'm open to this modification, yet I would recommend only using embed_sentences. embed_sentence was the first implemented version that is now kind-of deprecated and very inefficient compared to embed_sentences. When using embed_sentence, the memory is copied multiple times to transform the C++ vector into a numpy array. embed_sentences is more optimized and do not do any copy of memory ! Therefore, even for a single sentence, I would recommend using embed_sentences.