Closed p3d-dev closed 7 months ago
Hi @p3d-dev We appreciate your utilization of deepseek-llm. The issue you're experiencing originates from our tokenizer. We've incorporated a few German and special characters as added tokens, which triggers a bug of the HuggingFace Tokenizer. We have raised an issue at https://github.com/huggingface/tokenizers/issues/1392.
Tokenizer will decode letters in ["ø","ö","ú","ÿ","õ","÷","û","ý","À","ù","Á","þ","ü"] to bytes without meaning. These tokens are apparently ignored by ollama during the decoding process.
Given that modifying our vocabulary may introduce new complications, we decided not to rectify this bug in the current version. However, we would like to assure you that it will be addressed in the forthcoming update of our model. Thank you for your understanding.
Here are the responses for few models and deepseek-llm cannot output "ö" and "ü":
Is this a problem of the model or with ollama ?