PSA: Issue with Multi-GPU & CUDA 12.0

Blealtan / RWKV-LM-LoRA

RWKV is a RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding.

Apache License 2.0

405 stars 41 forks source link

PSA: Issue with Multi-GPU & CUDA 12.0 #49

Open PicoCreator opened 1 year ago

PicoCreator commented 1 year ago

Currently with RWKV and DeepSpeed, there seems to be an issue where it "hangs" when activating DeepSpeed with bf16

Specifically around this line

Currently this is tested to be resolved in Cuda 12.2