BlinkDL / ChatRWKV

ChatRWKV is like ChatGPT but powered by RWKV (100% RNN) language model, and open source.
Apache License 2.0
9.43k stars 696 forks source link

Add a support to "stop_words" in PIPELINE #160

Open yynil opened 1 year ago

yynil commented 1 year ago

Currently, the PIPELINE class in src/util.py has a arg "stop_token" which means the special designed single token_id to stop generation. But in most cases, the stop_token should be a token id list. For an example, if the prompt looks like : "User:请根据以下材料设计一道中餐菜谱。要求生成菜名和具体做法,菜谱最后以”完成!“结束。材料:猪后腿肉,青椒,洋葱,盐,胡椒。\nAssistant:菜名:" The results should looks like below:

红烧猪后腿肉
材料:猪后腿肉,青椒,洋葱,盐,胡椒
做法:
1. 猪后腿肉切成块状,用开水焯水去血水。
2. 热锅凉油,放入洋葱和青椒炒香。
3. 加入猪后腿肉块翻炒至变色。
4. 加入适量的盐和胡椒调味,继续翻炒至熟透。
5. 最后淋上少许生抽即可。
完成!

The stop_token should be set like below: end_token = pipeline.encode("完成!")

In current implementation, the end_token is not able to stop generation.

I just made an update in my fork to supply the stop_words implementation.