I am using 'sentence-transformers/all-mpnet-base-v2'. My question is what happens when I encode a text longer than 384 tokens?
Does the model embed sentences in the longer text separately?
If so how does the model calculate the final embedding for my text?
example:
text="""We evaluate the performance of SBERT for common Semantic Textual Similarity (STS) tasks. State-of-the-art methods often learn a (complex)
regression function that maps sentence embeddings to a similarity score. However, these regression functions work pair-wise and due to the combinatorial explosion those are often not scalable if
the collection of sentences reaches a certain size.
Instead, we always use cosine-similarity to compare the similarity between two sentence embeddings. We ran our experiments also with negative Manhatten and negative Euclidean distances
as similarity measures, but the results for all approaches remained roughly the same."""
model = SentenceTransformer('sentence-transformers/all-mpnet-base-v2')
embedding = model.encode(text)
I am using 'sentence-transformers/all-mpnet-base-v2'. My question is what happens when I encode a text longer than 384 tokens? Does the model embed sentences in the longer text separately? If so how does the model calculate the final embedding for my text? example: