THUDM / P-tuning

A novel method to tune language models. Codes and datasets for paper ``GPT understands, too''.
MIT License
924 stars 111 forks source link

Few-shot SuperGLUE的部分数据集效果复现问题 #15

Open Riroaki opened 3 years ago

Riroaki commented 3 years ago

您好,我在复现Few-shot SuperGLUE(即FewGLUE_32dev数据)实验时,CB、WSC、COPA数据集的结果和论文中存在一定差距(复现实验所有模型均基于albert-xxlarge-v2这一个预训练模型,与论文设计一致,实验seed=42无修改): image

实验设置差异:

关于CB数据集的实验

python库版本差异

考虑到可能存在版本差异影响造成复现效果不同,在此列出与requirements.txt对应的python库版本(括号中为项目requirements的库版本):

设备差异

全部复现实验在单张GeForce RTX 3090上进行。

请问如何理解模型效果的差异?

slczgwh commented 3 years ago

正好有同样的问题想问。我这边在SuperGLUE上的实验发现有几个数据集分数与随机数种子有很高的关联性(与使用的代码关联性就更高了,用Jiant和Allennlp跑出来分数差异也有几个点)。CB这个数据集甚至能从70多波动到90多。不知道作者是怎么处理这些随机因素的?

slczgwh commented 3 years ago

CB这个数据集,只用BERT-BASE-UNCASE跑十次随机数种子,差别也能到这个程度(Jiant的结果)。<html xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">

f13_bb -- 0.912281 0.866667 0.867925 0.945455 0.867925 0.857143 0.915254 0.912281 0.836364 0.912281

ywb2018 commented 3 years ago

请问下你跑的时候emb size设置的是768吗,其他代码有改动吗?我这边跑的rte,指标很低只有三四十不知为何

rookiebird commented 3 years ago

请问下你跑的时候emb size设置的是768吗,其他代码有改动吗?我这边跑的rte,指标很低只有三四十不知为何

确实比较神奇,我试着用cb这个script 跑了,发现报错,prompt embedding 默认值是128, 因此替换bert embedding对不上,但是cb 这个script 它又不指定embedding 这个参数值?

Xiao9905 commented 2 years ago

Thanks for your great work in reproducing P-tuning for few-shot SuperGLUE. In practice, we find few-shot learning's reproducibility extremely relates with environmental setting, hyper-parameters (e.g., batch-sizes, gradient-accumulation-step) and number of parallel GPUs. For example, in our experiment we use 8 V100 GPUs for a single dataset training, and if less GPUs or different type of GPUs are used, the performance can varies greatly.

In light of the volatility challenge, in the following work FewNLU @zheng-yanan present a more robust evaluation framework for few-shot SuperGLUE. P-tuning is also re-implemented in the FewNLU framework. Please check it if you have trouble setting up the same environment for fair comparison.

SCU-JJkinging commented 2 years ago

rompt embedding

请问prompt embedding 的大小需要设置和预训练模型的embedding_dim一样吗?直接拿作者的代码跑,会报错,维度不匹配