OpenBMB / XAgent

An Autonomous LLM Agent for Complex Task Solving
https://blog.x-agent.net/blog/xagent/
Apache License 2.0
7.82k stars 794 forks source link

XAgentGen: XAgentLlaMa-34B-preview能否通过多卡直接推理 #355

Closed Turingforce closed 6 months ago

Turingforce commented 6 months ago

问题/ Question:如何用3090 (4090) 24G * n的配置来运行XAgentLlaMa-34B-preview,或者显存的要求是多少?

运行/Run:

docker run -it -p 13520:13520 --network tool-server-network -v /mnt/XAgentLLaMa-34B-preview:/model:rw --gpus all --ipc=host xagentteam/xagentgen:latest python app.py --model-path /model --port 13520

日志/Log:

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 688.00 MiB. GPU 0 has a total capacty of 23.69 GiB of which 473.19 MiB is free. Process 59146 has 23.21 GiB memory in use. Of the allocated memory 22.75 GiB is allocated by PyTorch, and 9.99 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

测试配置/GPUs:

image

Umpire2018 commented 6 months ago

请参考 https://github.com/OpenBMB/XAgent/issues/248 https://github.com/OpenBMB/XAgent/issues/275