Question: Why do GPU and CPU embedding outputs differ for the same input? Is normal?

Prerequisites

[X] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
[X] I reviewed the Discussions, and have a new useful question to share that cannot be answered within Discussions.

Background Description

I am using the embedding example, the execution parameters are as follows embedding.exe -ngl 200000 -m I:\JYGAIBIN\MetaLlamaModel\Llama2-13b-chat\ggml-model-f32_q4_1.gguf --log-disable -p "Hello World!"

The first three embedding values are output when the CPU executes the embedding -4.67528416e-08 -1.07059577e-06 1.76811977e-06

The first three embedding values are output when the GPU (-ngl 200000) executes the embedding 5.86615059e-08 -1.02221782e-06 1.78800110e-06

Why are the same "Hello World!" inputs different? Does llama.cpp currently correctly support GPU and CPU embedding?

Also, does llama.cpp have specific instructions for underlying API functions, or usage precautions? In addition to those on github, is there any interface documentation website? Thank you

Possible Answer

I think for the same input content, the GPU and CPU output embedding values should be the same

ggerganov / llama.cpp