Open ahkimkoo opened 1 year ago
@ahkimkoo it has not been trained in Chinese data, please use only English for now.
@ahkimkoo it has not been trained in Chinese data, please use only English for now.
Thank you for your reply, but even if you use English, it can't reply normally
Can you give a screen shot on how you are loading the model, and what inputs you give?
Because its not patched. See here how to do that: https://github.com/lm-sys/FastChat/blob/0a827abe0cc60a3733b4406a070beb1ac8d0e5e1/fastchat/model/model_adapter.py#L445
@DachengLi1 I would like to follow up on this. I'm having the same issue, running the same model using fastchat openai-server implementation. Getting the same outputs (some times some "A A A A A A A A A A A A" screaming) while running the latest version with the monkey patch applied.
Here are the requests I send to the endpoint and relative output:
curl http://localhost:8100/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer fattiicazzituoi" \
-d '{
"model": "longchat-13b-16k",
"messages": [{"role": "user", "content": "Say this is a test."}],
"temperature": 0.3, "max_tokens": 200
}'
{"id":"chatcmpl-3tF6uZ7GXm54dmLwfGLQ3y","object":"chat.completion","created":1689243774,"model":"lmsys/longchat-13b-16k","choices":[{"index":0,"message":{"role":"assistant","content":"A A A A A A A A A A A A A A A A A A A A"},"finish_reason":"stop"}],"usage":{"prompt_tokens":45,"total_tokens":64,"completion_tokens":19}}
But if I use the "completions" (non-chat) endpoint the model works "correctly" (or at least it does not scream at me):
curl http://localhost:8100/v1/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer aaaaaaaaaaaa" \
-d '{
"model": "lmsys/longchat-13b-16k",
"prompt": "Say this is a test.",
"max_tokens": 20,
"temperature": 0.5
}'
{"id":"cmpl-sBiuu78WYegWDnU3WDmFmF","object":"text_completion","created":1689245357,"model":"lmsys/longchat-13b-16k","choices":[{"index":0,"text":"\nYou are a test.\n\n\n\n\n\n\n\n\n\n\n\n\n\n","logprobs":null,"finish_reason":"length"}],"usage":{"prompt_tokens":7,"total_tokens":26,"completion_tokens":19}}
TL;DR: LongChat-13B-16K goes like this:
@scuty2000 Fun image lol. @merrymercy do you have an idea on this. Is there a difference in login in completions versus chat completition (e.g. load_8_bit, patching)?
@DachengLi1 I don't know if this can help, but I suspect is related to the int8 quantization. Using the 7B version not quantized works pretty well.
@scuty2000 Yes, I also heard it elsewhere.
Any update on this?
reply like this: