alibaba / rtp-llm

RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.
Apache License 2.0
521 stars 48 forks source link

双卡A6000推理,模型推理结束,一张卡GPU利用率为0,一张卡GPU利用率100% #111

Open zf761 opened 1 month ago

zf761 commented 1 month ago

1724837489680

TOKENIZER_PATH=/DATA/LM_zhangfeng/models/Qwen2-72B-Instruct-AWQ CHECKPOINT_PATH=/DATA/LM_zhangfeng/models/Qwen2-72B-Instruct-AWQ MODEL_TYPE=qwen_2 FT_SERVER_TEST=1 CUDA_VISIBLE_DEVICES='2,3' START_PORT='18095' ENABLE_FAST_GEN=1 CONCURRENCY_LIMIT=200 PY_LOG_LEVEL=INFO TP_SIZE=2 WORLD_SIZE=2 python3 -m maga_transformer.start_server

netaddi commented 2 weeks ago

This should be a problem caused by nvidia-smi. You may refer to power consumption for its actual usage.