Blealtan / RWKV-LM-LoRA

RWKV is a RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding.
Apache License 2.0
405 stars 41 forks source link

PSA: Issue with Multi-GPU & CUDA 12.0 #49

Open PicoCreator opened 1 year ago

PicoCreator commented 1 year ago

Currently with RWKV and DeepSpeed, there seems to be an issue where it "hangs" when activating DeepSpeed with bf16

Specifically around this line

Screenshot 2023-08-04 at 9 30 34 AM

Currently this is tested to be resolved in Cuda 12.2