BlinkDL / RWKV-LM

RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding.
Apache License 2.0
12.05k stars 827 forks source link

Gratitude and Inquiries #192

Open 997172286 opened 9 months ago

997172286 commented 9 months ago

Dear Author,

I wanted to reach out and extend my gratitude for creating this remarkable model. It has truly opened up new horizons in my exploration of Large Language Models. I must say, I'm absolutely enamored by it.

Recently, I had the opportunity to test out the 5.2 version, experimenting with models ranging from 1.5B to 7B. The performance surpassed even that of the V4. Your implementation of the RWKV technique is indeed as impressive as its reputation suggests.

I do have a few questions:

  1. While exploring the 5.2 version, I noticed that the 3B model seems to demonstrate superior in-context learning abilities compared to the 7B. Could this be attributed to the fact that the 7B model only utilizes 10% of its parameters? (This is an assumption I made based on the nomenclature.)

  2. If I aim to further enhance the in-context learning ability with RWKV, are there any specific considerations or strategies you would recommend, apart from leveraging a specialized dataset?

Once again, I want to express my gratitude for your diligent work. I'm eagerly looking forward to your response.


BlinkDL commented 9 months ago

Try latest 7B 30% trained model: https://huggingface.co/BlinkDL/temp/tree/main It's already great :)