Closed Cumberbatch08 closed 4 years ago
Hi @Cumberbatch08 sadly the USE papers (at least the ones I know) are extremely high-level, not going really into the details. So it is unclear which architecture they exactly used and how the training was done (exact datasets, exact loss function etc.)
Differences:
USE and SBERT both use transformer networks. For USE, it is sadly not clear how many layers they use (most technical details are not provided). USE was trained from scratch (as far as I can tell from the paper), while SBERT uses the BERT / RoBERTa pre-trained wights and just fine-tunes them to produce sentence embeddings.
I think the main difference is in the pre-training. USE uses a wide variety of data sets (exact details not provided), specifically target for generating sentence embeddings. BERT was pre-trained on a book corpus and on Wikipedia for producing a language model (see the BERT paper). SBERT than fine-tunes BERT to produce sensible sentence embeddings.
USE is in TensorFlow and tuning for your use-case is not straightforward (source code not available, you only get the compiled model from tensorflow-hub). SBERT is based on pytorch and the goal of this repository is, that fine-tuning for your use-case is as simple as possible.
haha, yes, absolutely agreed what you said. The USE don't public much more details, such as the layers, dataset, loss etc. I get some information about the architecture: Just as you said, maybe the pretraining is important.
what would be best USE (https://tfhub.dev/google/universal-sentence-encoder/4) or SBERT models (https://huggingface.co/sentence-transformers) for good semantic search results ?
@Gurutva SBERT works much better: https://arxiv.org/pdf/2104.08663v1.pdf
First, many thanks to your paper and code. But I read the universal sentence encoder(USE) paper, the architecture is like simaese network, they also used the SNLI dataset. But your result is well performed. So I'm very interested in your work.