UKPLab / sentence-transformers

Multilingual Sentence & Image Embeddings with BERT
https://www.SBERT.net
Apache License 2.0
14.44k stars 2.4k forks source link

Error in training_stsbenchmark_avg_word_embeddings.py #217

Open ogabrielluiz opened 4 years ago

ogabrielluiz commented 4 years ago

Hi! I'm trying to implement the avg word embeddings example and I think I found an error.

In examples/training_basic_models/training_stsbenchmark_avg_word_embeddings.py line 44 shouldn't this: dan1 = models.Dense(in_features=sent_embeddings_dimension, out_features=sent_embeddings_dimension) be this: dan1 = models.Dense(in_features=word_embedding_model.get_word_embedding_dimension(), out_features=sent_embeddings_dimension)?

Otherwise I was getting a size mismatch RuntimeError pointing to the input. Puting the word_embedding dimension as input works.

nreimers commented 4 years ago

Hi @ogabrielluiz The example works on my machine as expected.

Note, the pooling layer might concatenate the mean pooling and the max pooling if you set both values to true. Then, the sentence embedding size is 2 * word_embedding_size. That's why I use pooling_model.get_sentence_embedding_dimension().

If you only apply mean pooling pooling_model.get_sentence_embedding_dimension() will return get_word_embedding_dimension()

Best Nils Reimers

ogabrielluiz commented 4 years ago

Hey, @nreimers. Thanks for replying.

If set both mean and max pooling to True it runs but in the example only mean pooling is set to True.

nreimers commented 4 years ago

Strange. What is the value of sent_embeddings_dimension, if you have only mean pooling?

Are you using the latest code from the repository or a version installed with pip?