epfml / sent2vec

General purpose unsupervised sentence representations
Other
1.19k stars 256 forks source link

How could I search better parameters? (or add getLoss() to python interface) #41

Closed ghost closed 6 years ago

ghost commented 6 years ago

I'm trying to train my unsupervised sent2vec model with Japanese text. Though the paper shows good parameters for English wikipedia, tweets, etc, they might not be good for my Japanese corpus. To search for good parameters, I'd like to know the loss of a trained model.

I guess getLoss() method might be for this. But I could not find the python equivalent. Is it possible to add this method to python interface?

Thank you.

martinjaggi commented 6 years ago

for japanese the best parameters are likely going to be very different indeed. optimizing for a good loss value is not enough, but it would be necessary to evaluate on a test task, such as sentence similarity or word similarity in japanese. for this you'd have to export the embeddings anyway, so i'm not sure just calling getLoss() would help. if you find some suitable dataset please let us know, we'd be interested to hear!