THUDM / P-tuning

A novel method to tune language models. Codes and datasets for paper ``GPT understands, too''.
MIT License
923 stars 111 forks source link

fully-supervised learning那部分的实验的具体细节。 #41

Open DonceDace opened 1 year ago

DonceDace commented 1 year ago

以CB数据集在论文p-tuning中bert-base-cased上面报告的为例ACC---89.2 ,F1---92.1 然后论文提到这句话

MP zero-shot and MP fine-tuning report results of a single pattern, while anchors for P-tuning are selected from the same prompt.

是指MP-zero-shot 和 MP fine-tuning p-tuning都使用同一个pattern 进行报告结果吗?

然后运行代码后发现得到的结果是一个 平均值±标准差 ,因此fully supervised learning的实验中,论文上面报告的性能都是只取平均值吗?

翻阅代码后发现

searched patterns in fully-supervised learning

    # string_list_a = [text_a, ' question: ', text_b, ' true, false or neither? answer:', "the", self.mask]
    # string_list_a = [text_a,  "[SEP]", example.text_b, "?", 'the',  " answer: ", self.mask]
    # string_list_a = [text_a,  "the",  text_b, "?",  "Answer:", self.mask]
    # string_list_a = [text_a, 'the the', 'question:', text_b, '?', 'the the', 'answer:', self.mask]
    # string_list_a = [text_a, "[SEP]", text_b, "?", "the", self.mask]

关于fully supervised learning 在p-tuning的实验是这5个patten得到的性能取平均进行报告,还是说每个pattern上面运行3次 然后取这5个pattern中性能最高的均值进行报告?