evilsocket / cake

Distributed LLM and StableDiffusion inference for mobile, desktop and server.
Other
2.43k stars 126 forks source link

bug with tokenizer and gibberish output #9

Open evilsocket opened 1 month ago

evilsocket commented 1 month ago

the tokenizer has issues resolving a few tokens including special ones (they will be shown in the output as ), which is causing all sorts of gibberish output ... it's probably a matter of parsing the model/tokenizer.json properly

evilsocket commented 1 month ago

the model responds well when prompt template is not used (https://github.com/evilsocket/cake/blob/main/cake-core/src/models/llama3/llama.rs#L266)

evilsocket commented 1 month ago

this is also happening in candle llama example code -> https://github.com/huggingface/candle/issues/2341

caohuilong commented 1 month ago

Has this been fixed? I met the same problem.

evilsocket commented 1 month ago

@caohuilong the issue is open, the bug has not been fixed, waiting for the candle team to respond