[dev-infctx][batch 3] Working bptt_learning across multiple-GPU

Blealtan / RWKV-LM-LoRA

RWKV is a RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding.

Apache License 2.0

405 stars 41 forks source link

[dev-infctx][batch 3] Working bptt_learning across multiple-GPU #44

Open PicoCreator opened 1 year ago

PicoCreator commented 1 year ago

Added synchronisation code for multi-gpu
May not actually be faster then a single GPU for mixed document sizes sadly, so changed the warning accordingly