ggerganov / llama.cpp

LLM inference in C/C++
MIT License
64.62k stars 9.25k forks source link

Bug: The output content is different #8585

Closed yancaoweidaode closed 3 days ago

yancaoweidaode commented 1 month ago

What happened?

First, I set the seed to 1 and temp to 0, to ensure that the llm always outputs the same content when facing the same input. For example, using llama3-8b, when I input

"<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nYou are a helpful assistant<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nhello, who are you?<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n", the output is

”9906: Hello1070: there0: !358: I2846: 'm264: a11190: helpful18328: assistant11: ,1618: here311: to7945: assist499: you449: with904: any4860: questions477: or9256: tasks499: you1253: may617: have13: .358: I2846: 'm264: a6500: computer2068: program6319: designed311: to3619: understand323: and6013: respond311: to5933: natural4221: language11: ,779: so499: you649: can6369: chat449: with757: me1120: just1093: like499: you1053: would449: with264: a4333: friend382: .

40: I649: can1520: help499: you449: with264: a7029: wide2134: range315: of2574: things11: ,1778: such439: as1473: :

9: 22559: Answer287: ing4860: questions389: on5370: various13650: topics11: ,505: from8198: science323: and3925: history311: to16924: entertainment323: and7829: culture198: 9: 81200: Providing17931: definitions323: and41941: explanations369: for4339: words323: and32847: phrases198: 9: 67118: Offering18726: suggestions323: and19075: recommendations369: for6603: books11: ,9698: movies11: ,4731: music11: ,323: and810: more198: 9: 2755: Ass11330: isting449: with4221: language14807: translation323: and32528: grammar27358: correction198: 9: 97554: Generating6848: ideas323: and87881: brainstorm287: ing10105: solutions311: to5435: problems198: 9: 1628: And1790: much810: more2268: !

4516: So11: ,1148: what596: 's389: on701: your4059: mind30: ?3234: Do499: you617: have264: a3230: specific3488: question477: or8712: topic499: you4265: 'd1093: like311: to4358: discuss30: ?358: I2846: 'm682: all25212: ears0: !128009: [end of text]. "

I have printed out both the sampled token id and the corresponding characters. Then, I put the first tokenid of the output at the end of the output token sequence, that is, embd_inp.push_back(9906), and the output I get is 1070: there0: !358: I2846: 'm459: an15592: AI18328: assistant11: ,6319: designed311: to1520: help499: you449: with264: a7029: wide2134: range315: of9256: tasks323: and4860: questions13: .358: I2846: 'm264: a5780: machine6975: learning1646: model11: ,16572: trained389: on264: a13057: vast3392: amount315: of1495: text828: data11: ,902: which20682: enables757: me311: to3619: understand323: and6013: respond311: to5933: natural4221: language11374: inputs382: .

40: I649: can7945: assist499: you449: with4395: everything505: from4689: general6677: knowledge323: and74032: trivia311: to810: more3230: specific13650: topics1093: like8198: science11: ,3925: history11: ,323: and5557: technology13: .358: I649: can1101: also1520: help499: you449: with4221: language14228: -related9256: tasks1778: such439: as4221: language14807: translation11: ,1495: text29385: summar2065: ization11: ,323: and1524: even4477: writing18726: suggestions382: .

40: I2846: 'm1618: here311: to1520: help499: you304: in904: any1648: way358: I649: can11: ,779: so2733: feel1949: free311: to2610: ask757: me4205: anything430: that596: 's389: on701: your4059: mind13: .3639: What596: 's389: on701: your4059: mind3432: today30: ?128009: [end of text].

Obviously, the two outputs are not the same. However, I think that due to the existence of the mask, the kvcache generated by the two calculations should be the same. So why are the output results different? Is there something I didn’t set correctly, or is there a bug somewhere in the code?

Name and Version

llama-cli, version e02b597be3702174e7b47b44cd03e1da1553284b, build with cmake,(windows11)

What operating system are you seeing the problem on?

No response

Relevant log output

No response

ggerganov commented 1 month ago

Can you confirm that adding -b 1 the results are the same?

yancaoweidaode commented 1 month ago

Can you confirm that adding -b 1 the results are the same?

I tried to add this parameter, but it couldn't output the result because "logical batch size for prompt processing (must be >=32 to use BLAS)"

ggerganov commented 1 month ago

What about -ub 1?

ngxson commented 1 month ago

Maybe related to #8593 , problem with seeding for sampling

yancaoweidaode commented 1 month ago

-ub 1 ok, I’ve tried it, and the output results are the same for both attempts. However, it’s strange that for the same prompt, the outputs are different when n_ubatch=1 and n_ubatch=512. But I can’t figure out where the problem might be.

yancaoweidaode commented 1 month ago

Maybe related to #8593 , problem with seeding for sampling

I’m afraid that might not be the case, because I’ve already set the seed to 1 in common.h

github-actions[bot] commented 3 days ago

This issue was closed because it has been inactive for 14 days since being marked as stale.