ROCm / DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.
https://www.deepspeed.ai/
Apache License 2.0
4 stars 3 forks source link

[BUG] I tried the image rocm/deepspeed:rocm5.7_ubuntu20.04_py3.9_pytorch_2.0.1_DeepSpeed_Inference with Llama-2 model ,i got errors #69

Open sunpian1 opened 3 months ago

sunpian1 commented 3 months ago

Free memory : 19.685547 (GigaBytes) Total memory: 23.984375 (GigaBytes) Requested memory: 0.312500 (GigaBytes) Setting maximum total tokens (input + output) to 1024 WorkSpace: 0x7f5cbbe00000 Memory access fault by GPU node-1 (Agent handle: 0x564cd7ba91d0) on address 0x7f5ccfe2c000. Reason: Page not present or supervisor privilege. [2024-02-19 08:36:43,155] [INFO] [launch.py:316:sigkill_handler] Killing subprocess 3349 [2024-02-19 08:36:43,156] [ERROR] [launch.py:322:sigkill_handler] ['/opt/conda/envs/py_3.9/bin/python', '-u', 'test.py', '--local_rank=0'] exits with return code = -6