Closed aced125 closed 2 years ago
Great suggestion! I'll take a look more into this, and see how we can integrate it. Right now we default to using the mean of the word embeddings at the second or third last layer. Still, using the mean approach can bias clustering results for really long/short sentences (which is why we have a parameter for that). I'll do some known checks that I have seen before on the new library to see if some of those same issues exist.
Hey @dmmiller612 ! Any updates? I can attest to the fact that SBERT delivers greater results in real systems. It would be great to use that!
Any updates about this? Seems to be a really good improvement.
Implemented in version 0.9.0
Hey Authors,
Since you are tokenizing each sentence separately, I suggest to check out this paper (Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks ) and the corresponding repo (https://github.com/UKPLab/sentence-transformers) from UKP labs in Germany.
They have shown that using the sum of Bert embeddings for each word to represent a sentence does very poorly on benchmarks (but at least better than using the CLS token).
I know you guys are using the second last or third last layer, but it is a trivial transition to move over to sentence-transformers.
In short, using the mean of BERT embeddings gains a spearmans of 0.45 on STS benchmarks, whereas sentence-BERT gains a spearman of 0.84, a significant improvement.
The model is easy enough to use: