DachengLi1 / LongChat

Official repository for LongChat and LongEval
Apache License 2.0
504 stars 29 forks source link

longchat-13b-16k chat not work #14

Open ahkimkoo opened 1 year ago

ahkimkoo commented 1 year ago

reply like this:

xxxxxxxxxx

DachengLi1 commented 1 year ago

@ahkimkoo it has not been trained in Chinese data, please use only English for now.

ahkimkoo commented 1 year ago

@ahkimkoo it has not been trained in Chinese data, please use only English for now.

Thank you for your reply, but even if you use English, it can't reply normally

DachengLi1 commented 1 year ago

Can you give a screen shot on how you are loading the model, and what inputs you give?

musabgultekin commented 1 year ago

Because its not patched. See here how to do that: https://github.com/lm-sys/FastChat/blob/0a827abe0cc60a3733b4406a070beb1ac8d0e5e1/fastchat/model/model_adapter.py#L445

scuty2000 commented 1 year ago

@DachengLi1 I would like to follow up on this. I'm having the same issue, running the same model using fastchat openai-server implementation. Getting the same outputs (some times some "A A A A A A A A A A A A" screaming) while running the latest version with the monkey patch applied.

Here are the requests I send to the endpoint and relative output:

curl http://localhost:8100/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer fattiicazzituoi" \
  -d '{
     "model": "longchat-13b-16k",
     "messages": [{"role": "user", "content": "Say this is a test."}],
     "temperature": 0.3, "max_tokens": 200
   }'

{"id":"chatcmpl-3tF6uZ7GXm54dmLwfGLQ3y","object":"chat.completion","created":1689243774,"model":"lmsys/longchat-13b-16k","choices":[{"index":0,"message":{"role":"assistant","content":"A A A A A A A A A A A A A A A A A A A A"},"finish_reason":"stop"}],"usage":{"prompt_tokens":45,"total_tokens":64,"completion_tokens":19}}

But if I use the "completions" (non-chat) endpoint the model works "correctly" (or at least it does not scream at me):

curl http://localhost:8100/v1/completions \ 
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer aaaaaaaaaaaa" \
  -d '{
    "model": "lmsys/longchat-13b-16k",
    "prompt": "Say this is a test.",
    "max_tokens": 20,
    "temperature": 0.5
  }'

{"id":"cmpl-sBiuu78WYegWDnU3WDmFmF","object":"text_completion","created":1689245357,"model":"lmsys/longchat-13b-16k","choices":[{"index":0,"text":"\nYou are a test.\n\n\n\n\n\n\n\n\n\n\n\n\n\n","logprobs":null,"finish_reason":"length"}],"usage":{"prompt_tokens":7,"total_tokens":26,"completion_tokens":19}}

TL;DR: LongChat-13B-16K goes like this: db637a5c-68ef-40c2-a53b-df198452bae9

DachengLi1 commented 1 year ago

@scuty2000 Fun image lol. @merrymercy do you have an idea on this. Is there a difference in login in completions versus chat completition (e.g. load_8_bit, patching)?

scuty2000 commented 1 year ago

@DachengLi1 I don't know if this can help, but I suspect is related to the int8 quantization. Using the 7B version not quantized works pretty well.

DachengLi1 commented 1 year ago

@scuty2000 Yes, I also heard it elsewhere.

scuty2000 commented 1 year ago

Any update on this?