Open xwdreamer opened 1 year ago
GPU用到的A100
nvidia-smi
Tue Aug 22 12:59:05 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.57.02 Driver Version: 470.57.02 CUDA Version: 12.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA A100-SXM... On | 00000000:41:00.0 Off | 0 |
| N/A 32C P0 66W / 400W | 3MiB / 81251MiB | 0% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
| 1 NVIDIA A100-SXM... On | 00000000:42:00.0 Off | 0 |
| N/A 33C P0 66W / 400W | 3MiB / 81251MiB | 0% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
same problems, I use 4*V100(32G), torch.cuda.OutOfMemoryError: CUDA out of memory
解决了吗?我也遇到这个问题了,全参数微调,具体需要多大的显存啊,fp16训练,80G 都OMM了。
Is there an existing issue for this?
Current Behavior
查看 ds_train_finetune.sh 文件
执行finetune
报错
Expected Behavior
No response
Steps To Reproduce
.
Environment
python -c "import torch; print(torch.cuda.is_available())"
) :TrueAnything else?
No response