双卡A6000推理，模型推理结束，一张卡GPU利用率为0，一张卡GPU利用率100%

alibaba / rtp-llm

RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.

Apache License 2.0

544 stars 50 forks source link

双卡A6000推理，模型推理结束，一张卡GPU利用率为0，一张卡GPU利用率100% #111

Closed zf761 closed 3 weeks ago

zf761 commented 2 months ago

1724837489680

TOKENIZER_PATH=/DATA/LM_zhangfeng/models/Qwen2-72B-Instruct-AWQ CHECKPOINT_PATH=/DATA/LM_zhangfeng/models/Qwen2-72B-Instruct-AWQ MODEL_TYPE=qwen_2 FT_SERVER_TEST=1 CUDA_VISIBLE_DEVICES='2,3' START_PORT='18095' ENABLE_FAST_GEN=1 CONCURRENCY_LIMIT=200 PY_LOG_LEVEL=INFO TP_SIZE=2 WORLD_SIZE=2 python3 -m maga_transformer.start_server

netaddi commented 2 months ago

This should be a problem caused by nvidia-smi. You may refer to power consumption for its actual usage.