Tutorial for python script?

RWKV / rwkv.cpp

INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model

MIT License

1.37k stars 90 forks source link

Hi, sorry. I'm a little bit of a noob, but I was wondering how to make a script in python with this. And yes I know there is an example, but I don't understand the example. I would just want to know the script and how to use it, where you can change the model, tokenizer, temperature, TOP_P, prescence penalty, frequency penalty and max tokens. And that there is just a way to give the model a prompt (via a string, like prompt = "How are you?") and then get the output just like how I gave the prompt.

So like if I had put in all the settings I just needed to do this.

prompt = "Hi, how are you?" output = model.calculate(prompt)

something like that. Just someting simple, because I don't understand the chat_with_bot.py script.

Sorry, I'm not that good at python, I hope someone can help me! Thanks already

Hi! I agree that chat_with_bot.py is somewhat complicated.

There is another script generate_completions.py, which is only 69 lines long, and, as I understand, does exactly what you've asked -- generates a completion by a prompt. It's almost the simplest version of inference code possible.

Answering specifically:

to change the model, you need to change the model_path argument to the script
to change the tokenizer, use tokenizer argument to the script
prompt, tokens_per_generation, temperature and top_p variables do exactly what they say
you would need to implement presence & frequency penalty yourself; here's the commit that introduced them

RWKV / rwkv.cpp

Tutorial for python script? #122