Closed wufeim closed 1 month ago
Hi @wufeim,
Thank you for your interest in our work and raising this issue. While exploring this issue we actually uncovered a bug in our code (#74 ).
Firstly, the output of batch size 1 is very different from output of batch size >1 because of a bug in the implementation of bidirectional_llama.py
. I have pushed a fix (#75 ). If you plan to use batch size of 1 while using encode, you should build lm2vec
package from source using the latest changes. The bug does not impact when the batch size is greater than 1.
Regarding the output being different for batch size 2,3,4 - this is a know issue of transformers library. Basically, this happens due to accumulation of matrix multiplication errors, which is more pronounced in lower precision like bf16. Here is a detailed explanation by one of the maintainers of transformers library.
By changing the precision of fp32, the variability with batch size will reduce quite a lot but will never be zero (see tests run in the above detailed explanation)
Here are more related issues on the transformers library for reference: https://github.com/huggingface/transformers/issues/26869 https://github.com/huggingface/transformers/issues/27626
Another on llama.cpp https://github.com/ggerganov/llama.cpp/issues/3014
Hope this answers your question, let me know if you have any more queries.
Closing as it is stale, feel free to re-open if the issue persists
Thanks for sharing this awesome work.
I'm trying a simple symmetric text retrieval demo, which involves computing text embeddings for text retrieval. What I don't understand is why I get different embeddings when I run
l2v
with one caption or multiple captions:The
print
statements would always output the first 10 out of 4096 values of the first caption's embedding. I expect allprint
statements outputing the same values but actually they don't. Am I misunderstanding something here?Thanks for your help!