I'm a bit puzzled by something I encountered trying to encode sentences as embeddings. When I ran the sentences through the model one at a time, I got slightly different results from when I ran batches of sentences.
I've reduced an example down to:
from transformers import pipeline
import numpy as np
p = pipeline('feature-extraction', model='allenai/scibert_scivocab_uncased')
s = 'the scurvy dog walked home alone'.split()
for l in range(1,len(s)+1):
txt = ' '.join(s[:l])
res1 = p(txt)
res2 = p(txt)
res1_2 = p([txt, txt])
print(l, txt, len(res1[0]))
print(all( np.allclose(i, j) for i, j in zip(res1[0], res2[0])),
all( np.allclose(i, j) for i, j in zip(res2[0], res1_2[0])),
all( np.allclose(i, j) for i, j in zip(res1_2[0], res1_2[1])))
The output I get is:
1 the 3
True False True
2 the scurvy 6
True True True
3 the scurvy dog 7
True False False
4 the scurvy dog walked 9
True False True
5 the scurvy dog walked home 10
True True False
6 the scurvy dog walked home alone 11
True True True
So running a single sentence through the model seems to give the same output each time, but if I run a batch with the same sentence twice, it's sometimes different (between the two outputs, and compared to the single-sentence case)
Is this expected/explainable?
Further context, I'm running it on CPU (laptop), python 3.8.9, freshly installed venv.
The difference is usually just in a few indices of the embeddings, and can be up to 1e-3. The difference is negligible when comparing the embeddings with cosine distance. But I'd like to understand where it comes from before dismissing it.
I'm a bit puzzled by something I encountered trying to encode sentences as embeddings. When I ran the sentences through the model one at a time, I got slightly different results from when I ran batches of sentences.
I've reduced an example down to:
The output I get is:
So running a single sentence through the model seems to give the same output each time, but if I run a batch with the same sentence twice, it's sometimes different (between the two outputs, and compared to the single-sentence case)
Is this expected/explainable?
Further context, I'm running it on CPU (laptop), python 3.8.9, freshly installed venv. The difference is usually just in a few indices of the embeddings, and can be up to 1e-3. The difference is negligible when comparing the embeddings with cosine distance. But I'd like to understand where it comes from before dismissing it.