BlinkDL / RWKV-LM

RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding.
Apache License 2.0
12.05k stars 827 forks source link

Stuck in Multigpus lora finetuning #170

Open PeiyuZ-star opened 11 months ago

PeiyuZ-star commented 11 months ago

​LoRA additionally training parameter time_mix_r LoRA training module blocks.38.ffn.key LoRA training module blocks.38.ffn.receptance LoRA training module blocks.38.ffn.value LoRA additionally training module blocks.39.ln1 LoRA additionally training module blocks.39.ln2 LoRA additionally training parameter time_decay LoRA additionally training parameter time_first LoRA additionally training parameter time_mix_k LoRA additionally training parameter time_mix_v LoRA additionally training parameter time_mix_r LoRA training module blocks.39.att.key LoRA training module blocks.39.att.value LoRA training module blocks.39.att.receptance LoRA additionally training parameter time_mix_k LoRA additionally training parameter time_mix_r LoRA training module blocks.39.ffn.key LoRA training module blocks.39.ffn.receptance LoRA training module blocks.39.ffn.value initializing deepspeed distributed: GLOBAL_RANK: 2, MEMBER: 3/4 [2023-07-31 19:41:57,810] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented initializing deepspeed distributed: GLOBAL_RANK: 3, MEMBER: 4/4 [2023-07-31 19:41:57,811] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented timed out waiting for input: auto-logout

BlinkDL commented 9 months ago

hi you can ask in the lora repo