First, thank you for open souring the data.
Like id=3 in zh_helpfulness or id=6 in zh_honesty, it has something like "我的创造者是复旦大学自然语言处理实验室和上海人工智能实验室". This is not good for training our own model. Is there a way to filter out these data other than guessing some random keywords?
First, thank you for open souring the data. Like id=3 in zh_helpfulness or id=6 in zh_honesty, it has something like "我的创造者是复旦大学自然语言处理实验室和上海人工智能实验室". This is not good for training our own model. Is there a way to filter out these data other than guessing some random keywords?