BlinkDL / RWKV-LM

RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding.
Apache License 2.0
12.32k stars 838 forks source link

Converting to huggingface #113

Closed lambdaofgod closed 1 year ago

lambdaofgod commented 1 year ago

sgugger made huggingface class for RWKV.

Do you know by any chance how he converted the model?

I've asked him how he did it, I think it'd be great to document here how RWKV can be used in transformers. A script to convert to PyTorch like this format would make it straightforward to port your new models :)

apolinario commented 1 year ago

Here is a guide: https://huggingface.co/blog/rwkv#weights-conversion

3outeille commented 1 year ago

@lambdaofgod https://huggingface.co/RWKV

Galaxy-Ding commented 1 year ago

sgugger made huggingface class for RWKV.

Do you know by any chance how he converted the model?

I've asked him how he did it, I think it'd be great to document here how RWKV can be used in transformers. A script to convert to PyTorch like this format would make it straightforward to port your new models :)

Hi, I'm new to this huggingface transformation. I now have:

I want to convert it into a huggingface model like yours repo [https://huggingface.co/RWKV]

What should I do to become a warehouse like yours? Can you be more detailed?

Galaxy-Ding commented 1 year ago

@lambdaofgod https://huggingface.co/RWKV

Hi, I'm new to this huggingface transformation. I now have:

I want to convert it into a huggingface model like yours repo [https://huggingface.co/RWKV]

What should I do to become a warehouse like yours? Can you be more detailed?