I periodically encounter infinite generations

I periodically encounter infinite generations in Qwen 2.5 7B Coder with FP8 quantization when feeding long texts around 20+k characters into the context.

I'm looking at their configs: https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct/blob/main/config.json https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct/blob/main/generation_config.json https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct/blob/main/tokenizer_config.json

In short, these three configs have completely confused me.

Hmm, I also found this: https://github.com/QwenLM/Qwen2.5-Coder Important

We have updated both the special tokens and their corresponding token ids to maintain consistency with Qwen2.5. The new special tokens are as follows: { "<|fim_prefix|>": 151659, "<|fim_middle|>": 151660, "<|fim_suffix|>": 151661, "<|fim_pad|>": 151662, "<|repo_name|>": 151663, "<|file_sep|>": 151664, "<|im_start|>": 151644, "<|im_end|>": 151645 }

How to properly modify config.json, generation_config.json, and tokenizer_config.json??

QwenLM / Qwen2.5-Coder

I periodically encounter infinite generations #152