BGEM3 模型的 GPU 利用率太低

ybalbert001 commented 7 months ago

from FlagEmbedding import BGEM3FlagModel

通过这种方式进行部署推理的，但是压测的时候，发现CPU利用率达到200%+， GPU利用率仅仅2%，T4的卡，有啥建议？

[INFO ] WorkerPool - loading model bge_m3_deploy_code (PENDING) on gpu(0) ...

[INFO ] ModelInfo - S3 url found, start downloading from s3://sagemaker-us-west-2-106839800180/LLM-RAG/workshop/bge-m3-model/ [INFO ] ModelInfo - artifacts has been downloaded already: /tmp/.djl.ai/download/7e7c28dd680c3b4e4a6ecd31452e1400a4d944a9 [INFO ] ModelInfo - Available CPU memory: 14172 MB, required: 0 MB, reserved: 500 MB [INFO ] ModelInfo - Available GPU memory: 15008 MB, required: 0 MB, reserved: 500 MB [INFO ] ModelInfo - Loading model bge_m3_deploy_code on gpu(0) [INFO ] WorkerPool - scale up 1 workers (1 - 1) [INFO ] PyProcess - Start process: 18999 - retry: 0 [INFO ] PyEnv - Found requirements.txt, start installing Python dependencies... [INFO ] PyEnv - pip install requirements succeed! [INFO ] Connection - Set CUDA_VISIBLE_DEVICES=0

python FlagEmbedding package 的版本是 1.2.3, 会不会太低了

staoxiao commented 7 months ago

请问用的什么function进行推理？

ybalbert001 commented 7 months ago

升级了FlagEmbedding版本就好了

FlagOpen / FlagEmbedding