Pretrain using the SlimPajama dataset

BlinkDL / RWKV-LM

RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding.

Apache License 2.0

12.05k stars 827 forks source link

Pretrain using the SlimPajama dataset #152

Closed chadqiu closed 11 months ago

chadqiu commented 1 year ago

Recent research has shown that more pretraining data can lead to better performance. RWKV has only used the relatively small Pile dataset. Has there been any consideration of using the larger SlimPajama dataset for pretraining and fairly comparing with LLaMA and OpenLLaMA?

BlinkDL commented 11 months ago

will use it in worldv2 :)