Why generate "GGGGG...." ,when the input string is longer than a certain length in GGUF model?

hzgdeerHo commented 6 months ago

TheBloke/deepseek-coder-33B-instruct-GGUF deepseek-coder-33b-instruct.Q6_K.gguf when I use llama cpp python to load the model , the model generates endless "GGGG...." ，But It works normally when the input question less than about 1000-2000 words.

   llm = Llama.from_pretrained(
        repo_id=args.model_name_or_path,
        # chat_format="llama-2",
        # chat_format="alpaca", 
        filename="deepseek-coder-33b-instruct.Q6_K.gguf",
        n_ctx=16000,            
        tokenizer=tokenizer,
         n_gpu_layers=-1,

        verbose=True
    )

    output = llm(
        user_prompt
       ,
        stream=True,
        max_tokens=12000,
        # max_new_tokens=4096,
        # do_sample=False, 
        top_k=50, 
        top_p=0.95,
        # num_return_sequences=1, 
        # eos_token_id=tokenizer.eos_token_id
        # repeat_penalty=1,

    )

    for chunk in output:
        delta = chunk['choices'][0]['text']
        print(delta,end='')

        bot_message+=delta

fubz commented 5 months ago

Just know you are not the only one that experiences this.

tastypear commented 2 months ago

have a try, turn on flash attention

anrgct commented 2 months ago

I encountered the same issue when using LM Studio version 0.2.27. However, when I launched it using text-generation-webui, it worked normally and didn't produce the "GGGG" output even with long context.

deepseek-ai / DeepSeek-Coder

Why generate "GGGGG...." ,when the input string is longer than a certain length in GGUF model? #151