deepseek-ai / DeepSeek-Coder

DeepSeek Coder: Let the Code Write Itself
https://coder.deepseek.com/
MIT License
6.01k stars 433 forks source link

When using the deepseek or generating content, the output contains the character � instead of expected characters. #88

Closed BinhMinhs10 closed 6 months ago

BinhMinhs10 commented 6 months ago

Details: Upon generating content with deepseek, the output includes the special character �, which is not the intended behavior. This issue might impact the readability and usability of the generated content.

Expected Behavior: The generated content should use appropriate characters and avoid the presence of � in the output

Screenshots: Screenshot from 2023-12-29 09-59-00

Steps to Reproduce: Use the deepseek coder generation process. For example, input the phrase "Bạn có thể làm gì" . Examine the generated content.

DOGEwbx commented 6 months ago

Hello @BinhMinhs10, I believe the letter that should appear in the � place must be ú. This is a decoding mistake originating from Huggingface tokenizers. You can find more information at this link: https://github.com/deepseek-ai/DeepSeek-LLM/issues/9#issuecomment-1835679866

BinhMinhs10 commented 6 months ago

@DOGEwbx Thank you, and I really looking forward to your update!