alibaba / rtp-llm

RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.
Apache License 2.0
544 stars 50 forks source link

双卡A6000推理,模型推理结束,一张卡GPU利用率为0,一张卡GPU利用率100% #109

Open zf761 opened 2 months ago

zf761 commented 2 months ago

1724837489680

TOKENIZER_PATH=/DATA/LM_zhangfeng/models/Qwen2-72B-Instruct-AWQ CHECKPOINT_PATH=/DATA/LM_zhangfeng/models/Qwen2-72B-Instruct-AWQ MODEL_TYPE=qwen_2 FT_SERVER_TEST=1 CUDA_VISIBLE_DEVICES='2,3' START_PORT='18095' ENABLE_FAST_GEN=1 CONCURRENCY_LIMIT=200 PY_LOG_LEVEL=INFO TP_SIZE=2 WORLD_SIZE=2 python3 -m maga_transformer.start_server