tybalex commented 7 months ago

Please include information about your system, the steps to reproduce the bug, and the version of llama.cpp that you are using. If possible, please provide a minimal code example that reproduces the bug. System: Mac M2 Max, OS version Sonoma 14.2.1 llama.cpp version: the latest main branch as of today -- Feb 29 2024 Steps to reproduce:

get the miniLM v6 embedding model from https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2
convert it to a gguf: python convert-hf-to-gguf.py --outfile minilm.gguf --outtype f16 all-MiniLM-L6-v2

get embedding by running the embedding.cpp example:

./embedding -m minilm.gguf --log-disable -p "prince"  -ngl 99

output:

embedding 0: -0.036860 0.041229 0.041869 0.041696 -0.024387 0.021268 0.099094 0.011098 0.004876 -0.046105 -0.059572 -0.029206 -0.002308 -0.083870 0.013792 0.045119 0.036514 0.097176 -0.023481 0.003588 -0.043461 -0.074656 -0.040720 0.011558 -0.078210 0.046951 0.077435 0.004489 -0.060391 -0.080243 0.012820 -0.069973 0.019214 0.052969 -0.030262 -0.031453 -0.038022 0.017023 -0.020902 0.049196 0.017024 0.016393 0.008936 0.048631 0.069907 -0.020548 -0.046252 -0.026483 0.003169 0.013299 -0.079572 0.067772 0.040068 -0.022476 -0.014747 0.048846 0.033857 -0.032799 0.091509 0.051690 0.008189 -0.021081 0.001400 0.008993 0.021849 0.031127 -0.006536 0.050509 0.000690 0.028135 -0.014092 0.036171 -0.006388 -0.098974 -0.053224 0.001512 0.039952 -0.103960 -0.017700 0.057037 -0.057424 -0.036943 -0.117668 -0.062323 -0.055580 -0.013580 0.019112 -0.006367 -0.044186 0.028640 0.034396 0.007887 -0.003876 0.035907 -0.105881 0.043552 0.109067 -0.079995 -0.056017 0.249003 -0.012027 0.053932 0.008201 0.036213 -0.010334 -0.036057 0.002546 0.002683 0.025382 -0.012716 -0.020140 -0.078044 -0.039887 0.008628 0.067469 -0.022117 0.026651 0.109164 0.025518 0.054249 0.032891 0.058545 -0.060185 -0.029968 -0.064054 -0.055134 -0.032062 -0.000000 0.023324 -0.049649 0.117273 0.017088 0.019652 0.014610 -0.022676 0.011467 -0.011677 0.041203 0.031052 -0.027259 -0.110914 -0.038720 0.008130 0.052588 -0.028568 -0.003301 0.003100 0.006353 -0.040994 0.127952 0.021877 0.031906 -0.059635 -0.035798 0.031328 -0.068779 0.069636 0.045009 0.014252 0.011030 0.031737 0.002799 -0.033560 -0.047101 -0.037738 -0.087675 -0.027649 -0.025608 -0.060913 -0.039024 0.059457 0.010664 -0.121365 0.004126 0.012507 -0.000979 0.027688 -0.006558 -0.002946 -0.058397 -0.050696 0.001921 -0.053137 -0.115931 -0.066889 0.010602 0.062312 -0.006016 0.122319 0.041626 0.040150 0.072560 0.002546 -0.014622 0.038451 0.046488 0.053678 -0.031259 -0.035950 0.078569 0.064906 0.009808 0.041488 0.000068 -0.116803 0.023502 -0.065634 0.007458 -0.072649 0.051103 0.002858 0.019553 0.069425 -0.018859 -0.049330 -0.055800 0.067243 0.079522 -0.080934 -0.026456 0.001355 0.035326 -0.037046 0.000000 -0.017859 -0.090733 0.059320 0.116348 0.035735 -0.031313 -0.003781 0.047099 -0.047402 -0.008223 -0.070947 -0.049794 0.117659 -0.091668 0.087372 0.001555 0.072384 0.017297 -0.044615 0.050339 -0.011038 -0.085056 -0.026181 -0.050121 0.014113 0.036018 -0.056591 -0.023850 -0.032202 0.011580 0.039819 0.107372 -0.033940 0.037161 -0.045220 0.065391 -0.047808 -0.007598 0.053954 0.021879 -0.010463 -0.055802 0.013686 0.107812 0.052527 0.006023 0.043839 0.066618 0.125267 0.018725 -0.106315 0.037015 -0.106218 -0.013532 0.039010 -0.004414 -0.057585 -0.003809 0.003101 0.074716 0.009683 0.002036 0.006178 -0.003161 -0.073237 0.124757 0.086564 0.037509 0.049667 -0.044155 0.084439 0.070531 -0.017895 0.068558 -0.018488 0.014648 -0.024090 0.032648 -0.036987 -0.051995 -0.023524 -0.045418 -0.060536 0.016920 -0.006316 -0.036464 0.091840 0.011222 -0.040349 -0.045818 0.034356 -0.018599 -0.035700 -0.064079 -0.020323 -0.000000 -0.027605 -0.003925 0.010733 -0.067407 0.036873 0.058545 -0.006928 -0.027731 0.018732 0.016576 0.076662 -0.074990 0.019847 -0.015870 0.035534 -0.048891 0.000796 -0.033112 -0.000590 0.017791 -0.023484 -0.021673 0.042539 -0.044049 -0.049730 -0.017081 -0.047250 -0.016705 0.033407 -0.013550 0.081062 -0.000081 -0.044371 0.035021 0.037153 0.030417 -0.021360 -0.015494 0.048064 -0.026997 -0.028125 0.004956 -0.014228 0.006874 -0.015346 -0.013070 0.011872 -0.036261 -0.044620 0.003815 -0.025334 -0.007049 0.077197 0.041967 0.083626 0.010578 0.028892 0.044975 -0.127511 0.023095 0.127018 -0.046558 -0.012191 0.023901

run the model as server mode: ./server -ngl 99 -m minilm.gguf --port 8019 --host 0.0.0.0 --embedding

make a curl request for embedding the same word 'prince':

curl http://localhost:8019/v1/embeddings \
-H "Content-Type: application/json" \
-H "Authorization: Bearer no-key" \
-d '{
    "input": ["prince"],
    "model":"minilm",
    "encoding_format": "float"
}'

Output:

{"data":[{"embedding":[-0.9113406538963318,0.20497266948223114,-0.05761261284351349,-0.1636194884777069,0.44455260038375854,0.08682599663734436,0.744450569152832,-0.03699329495429993,-0.5454420447349548,-0.4367120862007141,-0.07723181694746017,-0.19148540496826172,0.13116762042045593,-0.11302244663238525,0.22115185856819153,-0.2790590822696686,-0.35611769556999207,-0.3604397475719452,-0.046874552965164185,0.2406463325023651,-0.3163539171218872,0.10239306837320328,0.19693052768707275,-0.0763457864522934,-0.09630294144153595,-0.05589187890291214,-0.09276457875967026,0.21844173967838287,-0.0804717019200325,-0.36773544549942017,-0.34730562567710876,0.15269294381141663,0.525522768497467,-0.07095099985599518,-0.23811766505241394,-0.006564701441675425,-0.4374021589756012,-0.46858465671539307,0.12920492887496948,0.33534786105155945,0.11649847030639648,-0.35319891571998596,-0.1711725890636444,0.4761241674423218,0.413016676902771,-0.5310176014900208,0.13408483564853668,0.10075130313634872,0.42842817306518555,-0.2796638011932373,-0.15863917768001556,-0.14105407893657684,0.008249428123235703,-0.20821517705917358,0.3040767014026642,-0.14677953720092773,0.02599211037158966,-0.43027985095977783,0.18312017619609833,-0.13386058807373047,0.04302752763032913,0.07234133034944534,-0.9491630792617798,0.5000106692314148,-0.2805364727973938,0.13198977708816528,0.2868933081626892,-0.023559710010886192,-0.10929538309574127,0.3662813603878021,0.4353457987308502,-0.2235320508480072,0.03139911964535713,-0.6918119192123413,0.2975662350654602,0.20270520448684692,0.36246049404144287,0.1681171953678131,0.382007896900177,0.2189856618642807,-0.03305833786725998,0.07860100269317627,-0.5726684331893921,-0.016888771206140518,0.05296405777335167,-0.2583688795566559,-0.10431863367557526,-0.17162971198558807,-0.20041799545288086,0.3255464434623718,-0.33213892579078674,-0.38952669501304626,0.5833108425140381,-0.3714039921760559,-0.5180072784423828,-0.3262985646724701,0.19199618697166443,0.2610461413860321,-0.5567361116409302,2.5617096424102783,0.30518174171447754,0.17281579971313477,0.6737371683120728,-0.06384599953889847,0.2474653720855713,-0.3317680358886719,-0.3517586588859558,-0.1164378821849823,0.042621731758117676,0.059900663793087006,-0.0794406533241272,-0.17526094615459442,-0.08849326521158218,0.26080191135406494,0.23311978578567505,0.5595173239707947,-0.2197084128856659,-0.03350114822387695,0.20333600044250488,0.0007340013980865479,-0.04374226927757263,-0.23907612264156342,0.050346773117780685,0.06973200291395187,0.521949827671051,-0.774983286857605,0.026607058942317963,-4.3562683350689554e-32,0.04155222326517105,0.09096819907426834,0.053881049156188965,0.06294772773981094,0.2517699599266052,0.2299978882074356,0.039326563477516174,-0.26331254839897156,-0.106712207198143,0.3398314118385315,0.18540745973587036,0.18217933177947998,-0.09577855467796326,0.2027318924665451,0.3415830135345459,-0.0406864732503891,0.029755856841802597,-0.07023705542087555,0.01803077943623066,0.050523675978183746,-0.01507059670984745,-0.10759429633617401,-0.21249070763587952,0.25786375999450684,-0.14086893200874329,-0.34865590929985046,0.20274148881435394,-0.5245954990386963,0.19924187660217285,0.05390845239162445,-0.0734206885099411,0.31001031398773193,-0.2408670336008072,0.07897859811782837,-0.28300997614860535,-0.10916518419981003,0.11163628846406937,-0.46925920248031616,-0.21956908702850342,0.24989114701747894,-0.2724396586418152,0.07712733000516891,-0.18821056187152863,0.03654399886727333,-0.11255963891744614,1.0593624114990234,-0.11500044167041779,-0.03266199678182602,-0.3653186559677124,0.3872261941432953,-0.04578177630901337,-0.12280713766813278,-0.7358079552650452,0.20281924307346344,-0.1127614974975586,-0.21654832363128662,0.12830042839050293,-0.16045168042182922,0.02366318553686142,-0.005510266870260239,0.07435282319784164,0.17047753930091858,-0.01223795861005783,-0.0014442875981330872,-0.5768333077430725,-0.33469757437705994,0.20493263006210327,-0.0976196825504303,0.15795372426509857,-0.1524704098701477,-0.3397308588027954,0.1795063316822052,0.2294308990240097,0.2730967402458191,-0.1142972856760025,-0.13708257675170898,-0.12865528464317322,0.15601131319999695,0.44888660311698914,0.015956934541463852,0.37817105650901794,-0.34698113799095154,-0.0024727322161197662,-0.09719957411289215,0.6553595662117004,0.31680095195770264,-0.04055440053343773,-0.1605125516653061,0.15430790185928345,-0.23208746314048767,-0.2697872519493103,0.42774200439453125,-0.10204043984413147,0.5559654831886292,0.11096914112567902,2.616153190820014e-32,-0.5045391917228699,0.11107780784368515,-0.18468350172042847,0.5339704155921936,-0.08558398485183716,0.03379639610648155,0.2631123960018158,0.7414908409118652,-0.5430196523666382,0.5138211250305176,0.12716256082057953,-0.31557697057724,0.19252167642116547,0.253345787525177,-0.02899402379989624,0.11646998673677444,0.39367929100990295,-0.03387301415205002,-0.28333616256713867,-0.21347321569919586,-0.11846308410167694,-0.20535527169704437,-0.016133740544319153,0.18760168552398682,-0.2259252518415451,0.1494227647781372,0.18264584243297577,0.3958255350589752,-0.14623713493347168,0.2270394116640091,0.1714852750301361,-0.07063024491071701,-0.2905765771865845,-0.36629199981689453,0.3616235852241516,0.5592570304870605,0.10643155872821808,0.024111144244670868,0.21289567649364471,-0.31754186749458313,0.15855254232883453,-0.008792846463620663,0.004980940371751785,0.6545747518539429,-0.19380448758602142,-0.03998925909399986,-0.20742420852184296,0.12010551989078522,0.28698107600212097,0.5303142666816711,-0.6662395000457764,-0.17525595426559448,-0.09595256298780441,0.3452081084251404,-0.152899831533432,0.10215282440185547,-0.20828397572040558,0.5832223296165466,-0.18213698267936707,0.0345468744635582,-0.10129953920841217,0.16032829880714417,-0.13427498936653137,0.047849901020526886,-0.0865342766046524,0.22245003283023834,-0.1609944850206375,-0.1925918459892273,-0.07725381851196289,-0.37833285331726074,0.05138351023197174,-0.012934232130646706,-0.8680322766304016,0.10170803964138031,-0.2544623613357544,-0.32966041564941406,-0.08313576132059097,0.008441970683634281,-0.03992752358317375,-0.2733607590198517,-0.2615377604961395,0.10742823779582977,-0.06549906730651855,-0.1308857947587967,-0.09799693524837494,-0.08787287026643753,0.06873519718647003,0.09506669640541077,-0.3629777133464813,0.15921449661254883,-0.06500373035669327,0.17909352481365204,0.8790280818939209,0.03772491589188576,0.08673910051584244,-8.565214670852583e-08,-0.33929646015167236,0.3118705451488495,-0.4805440902709961,-0.032953061163425446,0.34920498728752136,0.24195101857185364,-0.1646021157503128,0.2519652247428894,0.1799640655517578,-0.22356748580932617,0.5443459749221802,0.13451939821243286,-0.29193437099456787,0.3693276643753052,0.6018487811088562,-0.07137903571128845,-0.7007991671562195,-0.38815778493881226,-0.09892711043357849,-0.21787768602371216,-0.07027873396873474,-0.118606798350811,0.10142236948013306,-0.5028542280197144,0.061366863548755646,-0.010358983650803566,-0.3588971197605133,0.42797258496284485,0.46163269877433777,-0.1796817183494568,0.044185034930706024,0.10516093671321869,0.07580085843801498,0.1000712662935257,0.17967012524604797,-0.46841052174568176,-0.4112969636917114,0.10569800436496735,-0.0003655441105365753,0.034334391355514526,-0.37717923521995544,-0.11426356434822083,0.39737415313720703,0.1742405742406845,-0.43300509452819824,-0.09926651418209076,0.07001541554927826,-0.06793393939733505,-0.1917962282896042,-0.4800603985786438,0.06117616593837738,-0.08069508522748947,-0.025915905833244324,0.5345154404640198,0.6164687275886536,0.056552939116954803,0.11888577044010162,-0.17393869161605835,0.52826988697052,0.18463830649852753,1.0523483753204346,0.8015890717506409,0.259770929813385,-0.11053761094808578],"index":0,"object":"embedding"}],"model":"minilm","object":"list","usage":{"prompt_tokens":0,"total_tokens":0}}

Expected Behavior: the embedding from these 2 approaches should yield the same output

Actual Behavior: As you can see, the output embedding looks completely different from the one from step 3, not only the values, but the scales are different too.

============================================================= And by the way, the embedding output I get from step 3 is almost the same with the one I got from using sentence_transformer python library, for example:

from sentence_transformers import SentenceTransformer
sentences = ["prince"]

model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')
embeddings = model.encode(sentences)
print(embeddings)

This indicates that the model conversion works correctly. I think there's something wrong with the Bert Embedding of server mode.

ngxson commented 7 months ago

tybalex commented 7 months ago

Maybe related to #5796

I think so. hopefully will be fixed by that.

tybalex commented 6 months ago

5796 did NOT fix this issue.

ngxson commented 6 months ago

Can you check the cosine distance of vector produced by embedding.cpp vs server.cpp?

Also maybe try without GPU offloading?

tybalex commented 6 months ago

Can you check the cosine distance of vector produced by embedding.cpp vs server.cpp?

Also maybe try without GPU offloading?

I tried without GPU offloading, got the same output.

As for the cosine distance, I calculated cosine distance between word prince and a list of words ["king", "queen", "apple", "orange"] and sorted them:

from embedding.cpp output: [('king', 0.4116078336488638), ('queen', 0.4211467172721288), ('apple', 0.6682980126084468), ('orange', 0.6874219028515791)]

from server.cpp: [('orange', 0.009215513122614483), ('king', 0.009233457008902879), ('queen', 0.01777521161063844), ('apple', 0.020477966154721194)]

ngxson commented 6 months ago

There's currently a refactoring on server code, maybe this will be fixed: #5882

iamlemec commented 6 months ago

It looks like this is actually a tokenization issue. I'm seeing the output of /tokenize as being pretty garbled. First, it doesn't appear to be adding an BOS token. We're currently not specifying the add_bos_token flag in the GGUF files for embeddings, so we might want to do that.

Second, it looks like something is up with special_tokens_cache. It seems to be adding in any token that is the concatenation of two other valid tokens. But that ends up being tons of regular words in addition to actual special tokens. The cache isn't used for regular embeddings, but the server seems to want to use it.

Edit: If you force it to add a BOS token and turn off special token processing, the tokenization comes out correct. And in that case the embedding numbers are correct too, though they're not normalized, so they won't look the same as the output from embedding.

ggerganov commented 6 months ago

Yes, the special flag is always on in server:

https://github.com/ggerganov/llama.cpp/blob/e04e04f8fad549bb0b3ec1c91f0413aeb08baf29/examples/server/server.cpp#L471-L477

And this seems to tokenize incorrectly. Not sure if this is somehow a problem with the vocab or if we simply need to turn off special flag when using embeddings models.

We should fix this and the normalization after we merge #5882

iamlemec commented 6 months ago

Trying to figure out what's up with special_tokens_cache and looking through https://github.com/ggerganov/llama.cpp/pull/3538 for guidance. Most models I'm looking at seem to correctly label special tokens with token_type. Do we have any examples of models that fail to do this properly? Seems like the kind of thing that should be taken care of during GGUF conversion.

tybalex commented 6 months ago

As I posted above, the embedding I got from embedding.cpp is the same with what I got from the origin model, so guess it's not a GGUF conversion issue. My observation is, with the SAME input and SAME gguf model, embedding.cpp and server.cpp yield different output.

github-actions[bot] commented 4 months ago

This issue was closed because it has been inactive for 14 days since being marked as stale.

ggerganov / llama.cpp

Inconsistent Bert Embedding output from embedding.cpp vs llama.cpp server #5801

5796 did NOT fix this issue.