Gratitude and Inquiries

Dear Author,

I wanted to reach out and extend my gratitude for creating this remarkable model. It has truly opened up new horizons in my exploration of Large Language Models. I must say, I'm absolutely enamored by it.

Recently, I had the opportunity to test out the 5.2 version, experimenting with models ranging from 1.5B to 7B. The performance surpassed even that of the V4. Your implementation of the RWKV technique is indeed as impressive as its reputation suggests.

I do have a few questions:

While exploring the 5.2 version, I noticed that the 3B model seems to demonstrate superior in-context learning abilities compared to the 7B. Could this be attributed to the fact that the 7B model only utilizes 10% of its parameters? (This is an assumption I made based on the nomenclature.)
If I aim to further enhance the in-context learning ability with RWKV, are there any specific considerations or strategies you would recommend, apart from leveraging a specialized dataset?

Once again, I want to express my gratitude for your diligent work. I'm eagerly looking forward to your response.

BlinkDL / RWKV-LM

Gratitude and Inquiries #192