CUDA_LAUNCH_BLOCKING=1、TORCH_USE_CUDA_DSA

OpenMOSS / MOSS

An open-source tool-augmented conversational language model from Fudan University

https://txsun1997.github.io/blogs/moss.html

Apache License 2.0

11.89k stars 1.15k forks source link

CUDA_LAUNCH_BLOCKING=1、TORCH_USE_CUDA_DSA #328

Open lhtpluto opened 1 year ago

lhtpluto commented 1 year ago

bash run.sh finetune_moss.py 出现异常 RuntimeError: CUDA error: out of memory CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

cuda pytorch版本：各种都测试了，报错就是不变

lhtpluto commented 1 year ago

确认是WSL的内存设置问题

问题已经解决

tonycbcd commented 1 year ago

How did you change this WSL? @lhtpluto

lhtpluto commented 1 year ago

How did you change this WSL? @lhtpluto

英文不好看不懂

lhtpluto commented 1 year ago

怎么解决WSL内存设置问题的，应该怎么操作@lhtpluto

在C:\Users \ <用户名>\ 下新建.wslconfig

.wslconfig内容例子： [wsl2] memory=480GB swap=32GB processors=56 localhostForwarding=true

======================== 需要注意的是，WSL 貌似仅支持64线程，而DEEPSPEED又不支持超线程，因此使用W9-3495X时，需要在BIOS中关闭超线程