RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding.
Apache License 2.0
12.39k
stars
843
forks
source link
RuntimeError: CUDA error: an illegal memory access was encountered #79
Traceback (most recent call last):
File "/home/cahya/_Work/RWKV-LM/RWKV-v4neo/train.py", line 350, in <module>│···
trainer.fit(model, data_loader)│····
File "/home/cahya/miniconda3/envs/rwkv/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 608, in fit│···
call._call_and_handle_interrupt(│···
File "/home/cahya/miniconda3/envs/rwkv/lib/python3.9/site-packages/pytorch_lightning/trainer/call.py", line 63, in _call_and_handle_interrupt│···
trainer._teardown()│······
File "/home/cahya/miniconda3/envs/rwkv/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1175, in _teardown│···
self.strategy.teardown()│······
File "/home/cahya/miniconda3/envs/rwkv/lib/python3.9/site-packages/pytorch_lightning/strategies/ddp.py", line 490, in teardown│···
super().teardown()│···pr-23
File "/home/cahya/miniconda3/envs/rwkv/lib/python3.9/site-packages/pytorch_lightning/strategies/parallel.py", line 125, in teardown│···
super().teardown()│···
File "/home/cahya/miniconda3/envs/rwkv/lib/python3.9/site-packages/pytorch_lightning/strategies/strategy.py", line 496, in teardown│···
self.lightning_module.cpu()│···
File "/home/cahya/miniconda3/envs/rwkv/lib/python3.9/site-packages/lightning_fabric/utilities/device_dtype_mixin.py", line 78, in cpu│···
return super().cpu()│···
File "/home/cahya/miniconda3/envs/rwkv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 954, in cpu│···
return self._apply(lambda t: t.cpu())│···
File "/home/cahya/miniconda3/envs/rwkv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 797, in _apply│···
module._apply(fn)│···
File "/home/cahya/miniconda3/envs/rwkv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 820, in _apply│···
param_applied = fn(param)│···
File "/home/cahya/miniconda3/envs/rwkv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 954, in <lambda>│···
return self._apply(lambda t: t.cpu())│···
RuntimeError: CUDA error: an illegal memory access was encountered│···
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.│···
What could be wrong here? It works when I fine tuned the 169m model. The GPU is A100..
Thanks
Hi, When I try to fine tune the model using following command:
I get following error message:
What could be wrong here? It works when I fine tuned the 169m model. The GPU is A100.. Thanks