Closed zhouyuustc closed 1 month ago
补充: 启动命令中加上--backend pytorch效果也不行,还是胡乱推理
另外补充测试case 使用lmdeploy部署的openai接口服务: 使用网上其他openai脚本的接口服务:
对于更复杂的测试case效果更明显, lmdeploy部署的openai接口服务返回结果
网上其他openai脚本返回结果
Can not reproduce with main branch.
--model-name gemma2
can be removed
--model-name gemma2
can be removed 没有model name的话,接口传参的这个model参数传什么呢?以前是传gemma2,现在不传会报错
日志:
(gemma2) root@4034937c8c66:/mnt# CUDA_VISIBLE_DEVICES=4 lmdeploy serve api_server /mnt/gemma2 \
--server-port 35554 \
--session-len 8000 \
--max-batch-size 10 \
--log-level INFO
2024-07-23 15:04:10,884 - lmdeploy - WARNING - Fallback to pytorch engine because /mnt/gemma2
not supported by turbomind engine.
2024-07-23 15:04:10,884 - lmdeploy - INFO - input backend=pytorch, backend_config=PytorchEngineConfig(model_name=None, tp=1, session_len=8000, max_batch_size=10, cache_max_entry_count=0.8, eviction_type='recompute', prefill_interval=16, block_size=64, num_cpu_blocks=0, num_gpu_blocks=0, adapters=None, max_prefill_token_num=4096, thread_safe=False, enable_prefix_caching=False, device_type='cuda', download_dir=None, revision=None)
2024-07-23 15:04:10,884 - lmdeploy - INFO - input chat_template_config=None
2024-07-23 15:04:11,637 - lmdeploy - INFO - updated chat_template_onfig=ChatTemplateConfig(model_name='gemma', system=None, meta_instruction=None, eosys=None, user=None, eoh=None, assistant=None, eoa=None, separator=None, capability=None, stop_words=None)
2024-07-23 15:04:11,826 - lmdeploy - INFO - Checking environment for PyTorch Engine.
2024-07-23 15:04:13,425 - lmdeploy - INFO - Checking model.
2024-07-23 15:04:13,425 - lmdeploy - WARNING - LMDeploy requires transformers version: [4.33.0 ~ 4.41.2], but found version: 4.42.2
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████| 12/12 [00:11<00:00, 1.06it/s]
2024-07-23 15:04:24,950 - lmdeploy - INFO - Patching model.
2024-07-23 15:04:25,547 - lmdeploy - INFO - build CacheEngine with config:CacheConfig(block_size=64, num_cpu_blocks=178, num_gpu_blocks=741, window_size=-1, cache_max_entry_count=0.8, max_prefill_token_num=4096, enable_prefix_caching=False)
2024-07-23 15:04:27,259 - lmdeploy - INFO - updated backend_config=PytorchEngineConfig(model_name=None, tp=1, session_len=8000, max_batch_size=10, cache_max_entry_count=0.8, eviction_type='recompute', prefill_interval=16, block_size=64, num_cpu_blocks=0, num_gpu_blocks=0, adapters=None, max_prefill_token_num=4096, thread_safe=False, enable_prefix_caching=False, device_type='cuda', download_dir=None, revision=None)
HINT: Please open http://0.0.0.0:35554 in a browser for detailed api usage!!!
HINT: Please open http://0.0.0.0:35554 in a browser for detailed api usage!!!
HINT: Please open http://0.0.0.0:35554 in a browser for detailed api usage!!!
INFO: Started server process [4217]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:35554 (Press CTRL+C to quit)
INFO: 36.33.26.136:46427 - "POST /v1/chat/completions HTTP/1.1" 200 OK
INFO: 36.33.26.136:50307 - "POST /v1/chat/completions HTTP/1.1" 200 OK
INFO: 36.33.26.136:3201 - "POST /v1/chat/completions HTTP/1.1" 200 OK
INFO: 36.33.26.136:50311 - "POST /v1/chat/completions HTTP/1.1" 200 OK
INFO: 36.33.26.136:50311 - "POST /v1/chat/completions HTTP/1.1" 422 Unprocessable Entity
INFO: 36.33.26.136:50329 - "POST /v1/chat/completions HTTP/1.1" 200 OK
INFO: 36.33.26.136:50329 - "POST /v1/chat/completions HTTP/1.1" 200 OK
INFO: 36.33.26.136:50351 - "POST /v1/chat/completions HTTP/1.1" 200 OK
INFO: 36.33.26.136:33694 - "POST /v1/chat/completions HTTP/1.1" 200 OK
INFO: 36.33.26.136:50382 - "POST /v1/chat/completions HTTP/1.1" 200 OK
INFO: 36.33.26.136:50382 - "POST /v1/chat/completions HTTP/1.1" 200 OK
fill the model in the input json or just set --model-name=gemma. gemma and gemma2 share same template.
Checklist
Describe the bug
gemma2吐出的结果不准确,乱回复 昨天使用lmdeploy部署后,发现lmdeploy部署的gemma2推理能力很差,可以说是胡乱推理(例如 ) 我之前一直用的一个openai.py脚本调用gemma2(例如 ),我知道gemma2性能绝对没这么差,我测试了几个例子,发现gemma2仿佛从一个大学生变成了小学生
我不知道是不是我那里命令不对,lmdeploy是最新的0.5.1版本,看官网上PyTorch 支持的模型已经支持了gemma2(但我使用命令行 lmdeploy list只看到gemma未看到gemma2) 求助!
Reproduction
CUDA_VISIBLE_DEVICES=4 lmdeploy serve api_server /mnt/gemma2 --server-port 35554 --model-name gemma2 --session-len 8000 --max-batch-size 10 --log-level INFO
Environment
Error traceback