GCYZSL / MoLA

89 stars 3 forks source link

关于实验中的NLP部分 #8

Closed login256 closed 3 months ago

login256 commented 3 months ago

实验中的MRPC数据集是序列分类,阅读代码中没有对perf的修改,没有更改序列分类的部分 请问这一部分是用什么形式去转换的数据集呢?

GCYZSL commented 3 months ago

您可以参考Data Preparation Scripts里对需要处理成的格式有详细介绍,我处理的方式如下:

hypothesis = data_sample["sentence1"]
premise = data_sample["sentence2"]
answer = ["not equivalent", "equivalent"][data_sample["label"]]
print(data_sample["label"], answer)
data_sample = {}
data_sample['input'] = ""
data_sample[
    'instruction'] = f"Tell me if the statements equivalent, not equivalent.\nSentence 1: {hypothesis}\nSentence 2: {premise}\n"
data_sample['output'] = f"Answer: {answer}."
data_sample['answer'] = answer

需要注意的是,evaluation代码也需要对gt做相应的处理。谢谢!

GCYZSL commented 3 months ago

You can process the samples in the MRPC dataset following the instructions in Readme. Our way to process the data is the following:

hypothesis = data_sample["sentence1"]
premise = data_sample["sentence2"]
answer = ["not equivalent", "equivalent"][data_sample["label"]]
print(data_sample["label"], answer)
data_sample = {}
data_sample['input'] = ""
data_sample[
    'instruction'] = f"Tell me if the statements equivalent, not equivalent.\nSentence 1: {hypothesis}\nSentence 2: {premise}\n"
data_sample['output'] = f"Answer: {answer}."
data_sample['answer'] = answer

Please note that the corresponding evaluation script should be modified as well.

GCYZSL commented 3 months ago

处理好的数据下载链接是:https://drive.google.com/file/d/1-AHDmTKnds9JTJPFr1CFCqneGjj1HB0M/view?usp=sharing

The link for downloading processed data is: https://drive.google.com/file/d/1-AHDmTKnds9JTJPFr1CFCqneGjj1HB0M/view?usp=sharing