How to filter some fo the personalized data?

OpenMOSS / MOSS

An open-source tool-augmented conversational language model from Fudan University

https://txsun1997.github.io/blogs/moss.html

Apache License 2.0

11.92k stars 1.14k forks source link

How to filter some fo the personalized data? #247

Open drxmy opened 1 year ago

drxmy commented 1 year ago

First, thank you for open souring the data. Like id=3 in zh_helpfulness or id=6 in zh_honesty, it has something like "我的创造者是复旦大学自然语言处理实验室和上海人工智能实验室". This is not good for training our own model. Is there a way to filter out these data other than guessing some random keywords?