Bigger is different! - Githubissues

BlinkDL / RWKV-LM

RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding.

Apache License 2.0

12.32k stars 838 forks source link

Bigger is different! #100

Open linkerlin opened 1 year ago

linkerlin commented 1 year ago

How about try to train a bigger model using RWKV? 1800B?

radarFudan commented 1 year ago

Curious + 1. But I guess suitable scaling up method is unknown. It can be layers, widths, gating, shortcuts... Probably need some empirical experiments and theoretical insights.