DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.
4
stars
3
forks
source link
[BUG] I tried the image rocm/deepspeed:rocm5.7_ubuntu20.04_py3.9_pytorch_2.0.1_DeepSpeed_Inference with Llama-2 model ,i got errors #69
Open
sunpian1 opened 3 months ago
Free memory : 19.685547 (GigaBytes) Total memory: 23.984375 (GigaBytes) Requested memory: 0.312500 (GigaBytes) Setting maximum total tokens (input + output) to 1024 WorkSpace: 0x7f5cbbe00000 Memory access fault by GPU node-1 (Agent handle: 0x564cd7ba91d0) on address 0x7f5ccfe2c000. Reason: Page not present or supervisor privilege. [2024-02-19 08:36:43,155] [INFO] [launch.py:316:sigkill_handler] Killing subprocess 3349 [2024-02-19 08:36:43,156] [ERROR] [launch.py:322:sigkill_handler] ['/opt/conda/envs/py_3.9/bin/python', '-u', 'test.py', '--local_rank=0'] exits with return code = -6