BlinkDL / RWKV-LM

RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding.
Apache License 2.0
11.99k stars 825 forks source link

Zero-division error when args.n_layer = 1, caused by ratio_0_to_1. Can I set ratio_0_to_1 = 0 when n_layer = 1? #243

Open zdxdsw opened 2 months ago

zdxdsw commented 2 months ago

Can you intuitively explain what ratio_0_to_1 is doing in RWKV_Tmix_x060? https://github.com/BlinkDL/RWKV-LM/blob/main/RWKV-v5/src/model.py#L290

I find that ratio_0_to_1 is defined by: ratio_0_to_1 = layer_id / (args.n_layer - 1) Then it defines multiple things for time_mix and time_decay.

However, my issue is I want to set args.n_layer = 1 , which would lead to the zero-division error. Does it make sense to hardcode ratio_0_to_1 = 0 when args.n_layer = l?

BlinkDL commented 4 days ago

you can hardcode ratio_0_to_1 to 0.5 in this case