Open ybalbert001 opened 7 months ago
from FlagEmbedding import BGEM3FlagModel
通过这种方式进行部署推理的,但是压测的时候,发现CPU利用率达到200%+, GPU利用率仅仅2%,T4的卡,有啥建议?
[INFO ] ModelInfo - S3 url found, start downloading from s3://sagemaker-us-west-2-106839800180/LLM-RAG/workshop/bge-m3-model/ [INFO ] ModelInfo - artifacts has been downloaded already: /tmp/.djl.ai/download/7e7c28dd680c3b4e4a6ecd31452e1400a4d944a9 [INFO ] ModelInfo - Available CPU memory: 14172 MB, required: 0 MB, reserved: 500 MB [INFO ] ModelInfo - Available GPU memory: 15008 MB, required: 0 MB, reserved: 500 MB [INFO ] ModelInfo - Loading model bge_m3_deploy_code on gpu(0) [INFO ] WorkerPool - scale up 1 workers (1 - 1) [INFO ] PyProcess - Start process: 18999 - retry: 0 [INFO ] PyEnv - Found requirements.txt, start installing Python dependencies... [INFO ] PyEnv - pip install requirements succeed! [INFO ] Connection - Set CUDA_VISIBLE_DEVICES=0
python FlagEmbedding package 的版本是 1.2.3, 会不会太低了
请问用的什么function进行推理?
升级了FlagEmbedding版本就好了
from FlagEmbedding import BGEM3FlagModel
通过这种方式进行部署推理的,但是压测的时候,发现CPU利用率达到200%+, GPU利用率仅仅2%,T4的卡,有啥建议?
[INFO ] WorkerPool - loading model bge_m3_deploy_code (PENDING) on gpu(0) ...
[INFO ] ModelInfo - S3 url found, start downloading from s3://sagemaker-us-west-2-106839800180/LLM-RAG/workshop/bge-m3-model/ [INFO ] ModelInfo - artifacts has been downloaded already: /tmp/.djl.ai/download/7e7c28dd680c3b4e4a6ecd31452e1400a4d944a9 [INFO ] ModelInfo - Available CPU memory: 14172 MB, required: 0 MB, reserved: 500 MB [INFO ] ModelInfo - Available GPU memory: 15008 MB, required: 0 MB, reserved: 500 MB [INFO ] ModelInfo - Loading model bge_m3_deploy_code on gpu(0) [INFO ] WorkerPool - scale up 1 workers (1 - 1) [INFO ] PyProcess - Start process: 18999 - retry: 0 [INFO ] PyEnv - Found requirements.txt, start installing Python dependencies... [INFO ] PyEnv - pip install requirements succeed! [INFO ] Connection - Set CUDA_VISIBLE_DEVICES=0
python FlagEmbedding package 的版本是 1.2.3, 会不会太低了