THUDM / ChatGLM-6B

ChatGLM-6B: An Open Bilingual Dialogue Language Model | 开源双语对话语言模型
Apache License 2.0
39.96k stars 5.15k forks source link

[BUG/Help] <problem about api parameter 'use_stream'> #1479

Open bpodq opened 1 month ago

bpodq commented 1 month ago

Is there an existing issue for this?

Current Behavior

我在两台服务器上测试API,一台是有A800 GPU(基于docker,独占GPU),另一台有3090 GPU(直接使用)

发现use_stream这个参数对结果的影响很大

当use_stream是False的时候,A800比3090快很多 10线程A800跑完约15秒,3090跑完约35秒

但是use_stream为True的时候 10线程A800跑完约45秒(是False的3倍),3090跑完约30秒,反而比A800快

30线程、100线程都类似

请问是怎么回事?

Expected Behavior

No response

Steps To Reproduce

python openai_api.py 启动api

然后发送请求 python openai_api_request2.py

是把python openai_api_request.py改了一下 主要改动如下:

L = []
m = 1
n = 10
for j in range(m):
    for i in range(n):

        t = Thread(target=simple_chat, args=(j*10+i, prompts[i], False))        # 切换True False
        # t = Thread(target=simple_chat, args=(j*10+i, prompts[i], True))
        t.start()
        L.append(t)

for i in range(m*n):
    L[i].join()

Environment

- OS: 20.04
- Python: 3.10
- Transformers: 分别是4.39.2, 4.40.1
- PyTorch:分别是2.1.2, 2.0.1
- CUDA Support: True

Anything else?

No response