ERNIE-VIL下游任务测试结果不稳定

corlder commented 3 years ago

您好，我在一个和VCR类似的下游任务（复用了VCR那部分的代码）上finetuning了erniesmall，但测试时发现，同一个val或者test划分，得到的准确率虽然接近，并不相同，为了排除数据集的问题，我将某条数据复制了1000倍，作为验证集进行测试，发现每次测试的结果都不尽相同。这里贴出了三次运行的结果： 第一次

~/vilio/vilio/ernie-vil$ python test_run.py 

finetuning tasks start
attention_probs_dropout_prob: 0.1
class_attr_size: 401
class_size: 1601
co_hidden_size: 1024
co_intermediate_size: 1024
co_num_attention_heads: 8
hidden_act: gelu
hidden_dropout_prob: 0.1
hidden_size: 768
initializer_range: 0.02
max_position_embeddings: 512
num_attention_heads: 12
num_hidden_layers: 12
sent_type_vocab_size: 4
t_biattention_id: [6, 7, 8, 9, 10, 11]
task_type_vocab_size: 16
type_vocab_size: 2
v_biattention_id: [0, 1, 2, 3, 4, 5]
v_hidden_size: 1024
v_intermediate_size: 1024
v_num_attention_heads: 8
vocab_size: 30522
------------------------------------------------
task:  [{'task': 'PMR', 'num_choice': 4, 'annotations_jsonpath_train': './data/pmr/annotations/train.jsonl', 'annotations_jsonpath_val': './data/pmr/annotations/val.jsonl', 'annotations_jsonpath_test': './data/pmr/annotations/test.jsonl', 'feature_lmdb_path': './data/pmr/annotations/pmr_10-36.tsv', 'gt_feature_lmdb_path': './data/pmr/annotations/pmr_10-36.tsv', 'unisex_names_table': './data/pmr/annotations/unisex_names_table.csv', 'Proprocessor': 'PreprocessorBasic', 'tokenizer_name': 'FullTokenizer', 'tagger_path': './script/ntc.pickle', 'nltk_data_path': './nltk_data', 'fusion_method': 'mul', 'dropout_rate': 0.1, 'max_seq_len': 60, 'use_gt_fea': False, 'task_prefix': 'pmr'}]
2021-06-09 16:19:43,389-WARNING: paddle.fluid.layers.py_reader() may be deprecated in the near future. Please use paddle.fluid.io.DataLoader.from_generator() instead.
theoretical memory usage: 
(5271.439203643799, 5522.4601181030275, 'MB')
W0609 16:19:44.479754 15199 device_context.cc:252] Please NOTE: device: 4, CUDA Capability: 75, Driver API Version: 11.2, Runtime API Version: 9.0
W0609 16:19:44.481159 15199 device_context.cc:260] device: 4, cuDNN Version: 8.0.
Start to load Faster-RCNN detected objects from ./data/pmr/annotations/pmr_10-36.tsv
Loaded 1 images in file ./data/pmr/annotations/pmr_10-36.tsv in 23 seconds.
only butd feature
Load pretraining parameters from ./output_pmr/step_15000train.
testing on pmr val split
task name list :  ['mean_0.tmp_0', 'accuracy_0.tmp_0', 'arg_max_0.tmp_0', 'read_file_0.tmp_9', 'read_file_0.tmp_8', 'reshape2_120.tmp_0']
cur_step: 10 cur_acc: 1.0
cur_step: 20 cur_acc: 1.0
cur_step: 30 cur_acc: 1.0
cur_step: 40 cur_acc: 0.9984375
cur_step: 50 cur_acc: 0.99875
cur_step: 60 cur_acc: 0.9979166666666667
EXCEPTING
LEN: 1075 1075
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1075 entries, 0 to 1074
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   id      1075 non-null   int64  
 1   proba   1075 non-null   float64
 2   label   1075 non-null   int64  
dtypes: float64(1), int64(2)
memory usage: 25.3 KB
None
average_acc: 0.9981617647058824

第二次

finetuning tasks start
attention_probs_dropout_prob: 0.1
class_attr_size: 401
class_size: 1601
co_hidden_size: 1024
co_intermediate_size: 1024
co_num_attention_heads: 8
hidden_act: gelu
hidden_dropout_prob: 0.1
hidden_size: 768
initializer_range: 0.02
max_position_embeddings: 512
num_attention_heads: 12
num_hidden_layers: 12
sent_type_vocab_size: 4
t_biattention_id: [6, 7, 8, 9, 10, 11]
task_type_vocab_size: 16
type_vocab_size: 2
v_biattention_id: [0, 1, 2, 3, 4, 5]
v_hidden_size: 1024
v_intermediate_size: 1024
v_num_attention_heads: 8
vocab_size: 30522
------------------------------------------------
task:  [{'task': 'PMR', 'num_choice': 4, 'annotations_jsonpath_train': './data/pmr/annotations/train.jsonl', 'annotations_jsonpath_val': './data/pmr/annotations/val.jsonl', 'annotations_jsonpath_test': './data/pmr/annotations/test.jsonl', 'feature_lmdb_path': './data/pmr/annotations/pmr_10-36.tsv', 'gt_feature_lmdb_path': './data/pmr/annotations/pmr_10-36.tsv', 'unisex_names_table': './data/pmr/annotations/unisex_names_table.csv', 'Proprocessor': 'PreprocessorBasic', 'tokenizer_name': 'FullTokenizer', 'tagger_path': './script/ntc.pickle', 'nltk_data_path': './nltk_data', 'fusion_method': 'mul', 'dropout_rate': 0.1, 'max_seq_len': 60, 'use_gt_fea': False, 'task_prefix': 'pmr'}]
2021-06-09 16:22:23,550-WARNING: paddle.fluid.layers.py_reader() may be deprecated in the near future. Please use paddle.fluid.io.DataLoader.from_generator() instead.
theoretical memory usage: 
(5271.439203643799, 5522.4601181030275, 'MB')
W0609 16:22:24.612071 15347 device_context.cc:252] Please NOTE: device: 4, CUDA Capability: 75, Driver API Version: 11.2, Runtime API Version: 9.0
W0609 16:22:24.613502 15347 device_context.cc:260] device: 4, cuDNN Version: 8.0.
Start to load Faster-RCNN detected objects from ./data/pmr/annotations/pmr_10-36.tsv
Loaded 1 images in file ./data/pmr/annotations/pmr_10-36.tsv in 23 seconds.
only butd feature
Load pretraining parameters from ./output_pmr/step_15000train.
testing on pmr val split
task name list :  ['mean_0.tmp_0', 'accuracy_0.tmp_0', 'arg_max_0.tmp_0', 'read_file_0.tmp_9', 'read_file_0.tmp_8', 'reshape2_120.tmp_0']
cur_step: 10 cur_acc: 1.0
cur_step: 20 cur_acc: 1.0
cur_step: 30 cur_acc: 0.9979166666666667
cur_step: 40 cur_acc: 0.9984375
cur_step: 50 cur_acc: 0.99875
cur_step: 60 cur_acc: 0.9989583333333333
EXCEPTING
LEN: 1075 1075
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1075 entries, 0 to 1074
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   id      1075 non-null   int64  
 1   proba   1075 non-null   float64
 2   label   1075 non-null   int64  
dtypes: float64(1), int64(2)
memory usage: 25.3 KB
None
average_acc: 0.9981617647058824

第三次

finetuning tasks start
attention_probs_dropout_prob: 0.1
class_attr_size: 401
class_size: 1601
co_hidden_size: 1024
co_intermediate_size: 1024
co_num_attention_heads: 8
hidden_act: gelu
hidden_dropout_prob: 0.1
hidden_size: 768
initializer_range: 0.02
max_position_embeddings: 512
num_attention_heads: 12
num_hidden_layers: 12
sent_type_vocab_size: 4
t_biattention_id: [6, 7, 8, 9, 10, 11]
task_type_vocab_size: 16
type_vocab_size: 2
v_biattention_id: [0, 1, 2, 3, 4, 5]
v_hidden_size: 1024
v_intermediate_size: 1024
v_num_attention_heads: 8
vocab_size: 30522
------------------------------------------------
task:  [{'task': 'PMR', 'num_choice': 4, 'annotations_jsonpath_train': './data/pmr/annotations/train.jsonl', 'annotations_jsonpath_val': './data/pmr/annotations/val.jsonl', 'annotations_jsonpath_test': './data/pmr/annotations/test.jsonl', 'feature_lmdb_path': './data/pmr/annotations/pmr_10-36.tsv', 'gt_feature_lmdb_path': './data/pmr/annotations/pmr_10-36.tsv', 'unisex_names_table': './data/pmr/annotations/unisex_names_table.csv', 'Proprocessor': 'PreprocessorBasic', 'tokenizer_name': 'FullTokenizer', 'tagger_path': './script/ntc.pickle', 'nltk_data_path': './nltk_data', 'fusion_method': 'mul', 'dropout_rate': 0.1, 'max_seq_len': 60, 'use_gt_fea': False, 'task_prefix': 'pmr'}]
2021-06-09 16:23:45,356-WARNING: paddle.fluid.layers.py_reader() may be deprecated in the near future. Please use paddle.fluid.io.DataLoader.from_generator() instead.
theoretical memory usage: 
(5271.439203643799, 5522.4601181030275, 'MB')
W0609 16:23:46.432518 15492 device_context.cc:252] Please NOTE: device: 4, CUDA Capability: 75, Driver API Version: 11.2, Runtime API Version: 9.0
W0609 16:23:46.433915 15492 device_context.cc:260] device: 4, cuDNN Version: 8.0.
Start to load Faster-RCNN detected objects from ./data/pmr/annotations/pmr_10-36.tsv
Loaded 1 images in file ./data/pmr/annotations/pmr_10-36.tsv in 24 seconds.
only butd feature
Load pretraining parameters from ./output_pmr/step_15000train.
testing on pmr val split
task name list :  ['mean_0.tmp_0', 'accuracy_0.tmp_0', 'arg_max_0.tmp_0', 'read_file_0.tmp_9', 'read_file_0.tmp_8', 'reshape2_120.tmp_0']
cur_step: 10 cur_acc: 0.99375
cur_step: 20 cur_acc: 0.996875
cur_step: 30 cur_acc: 0.9958333333333333
cur_step: 40 cur_acc: 0.996875
cur_step: 50 cur_acc: 0.99625
cur_step: 60 cur_acc: 0.9958333333333333
EXCEPTING
LEN: 1075 1075
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1075 entries, 0 to 1074
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   id      1075 non-null   int64  
 1   proba   1075 non-null   float64
 2   label   1075 non-null   int64  
dtypes: float64(1), int64(2)
memory usage: 25.3 KB
None
average_acc: 0.9963235294117647

我之前猜测会不会是忘了设置evaluation模式了，但仔细看了finetune.py里的代码，发现已经有了这句test_prog = test_prog.clone(for_test=True)，所以应该不是这个的问题。您能否帮我分析下可能的原因吗？谢谢

yinweichong commented 3 years ago

数据处理中，会把detection的随机替换为一个名字（做法参照了vilbert），具体是https://github.com/PaddlePaddle/ERNIE/blob/2641a12a472a94f5b719dc59fb7c6f231ac93d42/ernie-vil/reader/vcr_finetuning.py#L252

corlder commented 3 years ago

好的，确实是这个问题导致的，感谢回复。但我还有个疑问，就是ERNIE-VIL这样处理之后，是不是文本中人物名与objects类别里的图片区域就没法形成对应了？所以文本中名字具体指代图片中的哪个人并没有显示给出，需要模型自己判断？

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Feel free to reopen it. Thank you for your contributions.

PaddlePaddle / ERNIE

ERNIE-VIL下游任务测试结果不稳定 #694

第二次

第三次