DeBertaV2模型重复实验不可复现问题(loss有差异)

HUSTHY commented 2 years ago

使用DeBertaV2做分类任务，采用Erlangshen-DeBERTa-v2-97M-Chinese中文预训练权重环境如下：cuda11.2 torch 1.8.1+cu111 python 3.7.7 transformers 4.21.1 运行同样的代码2次结果不一样，同样的环境和参数，设置了随机种子日志信息如下： `(hy_py37_torch) [root@localhost ccf_fewshot_classification]# python train_patent_bert_kfold.py /home/kedu/opt/anaconda3/envs/hy_py37_torch/lib/python3.7/site-packages/sklearn/utils/validation.py:37: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead. LARGE_SPARSE_SUPPORTED = LooseVersion(scipy_version) >= '0.14.0' 2022-09-08 17:21:05,479 train_patent_bert_kfold.py [line:95] INFO submit_path------submit/submit_title_abstract_ernie_5fold_integrate_logit_2022-09-08_20.csv 2022-09-08 17:21:05,479 train_patent_bert_kfold.py [line:97] INFO Namespace(accumulation_steps=1, adversarial_type='PGD', batch_size=16, bert_type='deberta', data_type='title_abstract', device='0', duplicate=1, epochs=5, integrate_type='logit', is_adversarial=True, is_masklm=False, is_prompt=False, lr=2e-05, max_len=460, model_out='./output/patent/', pretrained='./pretrained_models/torch/Erlangshen-DeBERTa-v2-97M-Chinese', prompt_text='[SEP]专利类别[MASK]', random_seed=100, test_file='./data/testA.json', train_file='./data/train.json') 2022-09-08 17:21:05,479 train_patent_bert_kfold.py [line:98] INFO data_type--------title_abstract 2022-09-08 17:21:05,480 train_patent_bert_kfold.py [line:99] INFO patentBert---------./pretrained_models/torch/Erlangshen-DeBERTa-v2-97M-Chinese 2022-09-08 17:21:05,544 train_patent_bert_kfold.py [line:352] INFO test_datas: 20839 2022-09-08 17:21:05,546 train_patent_bert_kfold.py [line:357] INFO train_datas: 958 tokenization: 20839it [00:15, 1306.00it/s] 2022-09-08 17:21:21,505 train_patent_bert_kfold.py [line:125] INFO ================fold 0=============== 2022-09-08 17:21:21,505 train_patent_bert_kfold.py [line:128] INFO save_path---------./output/patent/deberta_186M_title_abstract_2022-09-08_fold_0 Some weights of the model checkpoint at ./pretrained_models/torch/Erlangshen-DeBERTa-v2-97M-Chinese were not used when initializing PatentDeBertaV2: ['cls.predictions.decoder.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.bias', 'cls.predictions.transform.dense.bias']

This IS expected if you are initializing PatentDeBertaV2 from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
This IS NOT expected if you are initializing PatentDeBertaV2 from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). Some weights of PatentDeBertaV2 were not initialized from the model checkpoint at ./pretrained_models/torch/Erlangshen-DeBERTa-v2-97M-Chinese and are newly initialized: ['classifier.bias', 'classifier.weight'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. tokenization: 766it [00:00, 1287.14it/s] tokenization: 192it [00:00, 1282.17it/s] 2022-09-08 17:21:25,928 train_patent_bert_kfold.py [line:156] INFO Running training 2022-09-08 17:21:25,928 train_patent_bert_kfold.py [line:157] INFO Num examples = 48 2022-09-08 17:21:25,929 train_patent_bert_kfold.py [line:158] INFO Num Epochs = 5 2022-09-08 17:21:25,929 train_patent_bert_kfold.py [line:159] INFO Num batch_size = 16 [evaldation] 12/12 [==============================] 97.3ms/step step: 11.0000 2022-09-08 17:21:40,516 train_patent_bert_kfold.py [line:197] INFO save model 2022-09-08 17:21:42,196 train_patent_bert_kfold.py [line:204] INFO val_macro_f1:0.006359------best_macro_f1:0.006359, loss:2.119576 [evaldation] 12/12 [==============================] 96.7ms/step step: 11.0000 2022-09-08 17:21:56,371 train_patent_bert_kfold.py [line:197] INFO save model 2022-09-08 17:21:58,020 train_patent_bert_kfold.py [line:204] INFO val_macro_f1:0.011809------best_macro_f1:0.011809, loss:1.970104 [evaldation] 12/12 [==============================] 97.0ms/step step: 11.0000 2022-09-08 17:22:12,217 train_patent_bert_kfold.py [line:204] INFO val_macro_f1:0.010802------best_macro_f1:0.011809, loss:0.696715 [evaldation] 12/12 [==============================] 99.2ms/step step: 11.0000 2022-09-08 17:22:26,410 train_patent_bert_kfold.py [line:197] INFO save model 2022-09-08 17:22:28,177 train_patent_bert_kfold.py [line:204] INFO val_macro_f1:0.015645------best_macro_f1:0.015645, loss:0.691476 [evaldation] 12/12 [==============================] 97.7ms/step step: 11.0000 2022-09-08 17:22:42,333 train_patent_bert_kfold.py [line:204] INFO val_macro_f1:0.008696------best_macro_f1:0.015645, loss:0.281529 (hy_py37_torch) [root@localhost ccf_fewshot_classification]# python train_patent_bert_kfold.py /home/kedu/opt/anaconda3/envs/hy_py37_torch/lib/python3.7/site-packages/sklearn/utils/validation.py:37: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead. LARGE_SPARSE_SUPPORTED = LooseVersion(scipy_version) >= '0.14.0' 2022-09-08 17:23:30,084 train_patent_bert_kfold.py [line:95] INFO submit_path------submit/submit_title_abstract_ernie_5fold_integrate_logit_2022-09-08_20.csv 2022-09-08 17:23:30,084 train_patent_bert_kfold.py [line:97] INFO Namespace(accumulation_steps=1, adversarial_type='PGD', batch_size=16, bert_type='deberta', data_type='title_abstract', device='0', duplicate=1, epochs=5, integrate_type='logit', is_adversarial=True, is_masklm=False, is_prompt=False, lr=2e-05, max_len=460, model_out='./output/patent/', pretrained='./pretrained_models/torch/Erlangshen-DeBERTa-v2-97M-Chinese', prompt_text='[SEP]专利类别[MASK]', random_seed=100, test_file='./data/testA.json', train_file='./data/train.json') 2022-09-08 17:23:30,084 train_patent_bert_kfold.py [line:98] INFO data_type--------title_abstract 2022-09-08 17:23:30,084 train_patent_bert_kfold.py [line:99] INFO patentBert---------./pretrained_models/torch/Erlangshen-DeBERTa-v2-97M-Chinese 2022-09-08 17:23:30,149 train_patent_bert_kfold.py [line:352] INFO test_datas: 20839 2022-09-08 17:23:30,151 train_patent_bert_kfold.py [line:357] INFO train_datas: 958 tokenization: 20839it [00:16, 1264.25it/s] 2022-09-08 17:23:46,638 train_patent_bert_kfold.py [line:125] INFO ================fold 0=============== 2022-09-08 17:23:46,638 train_patent_bert_kfold.py [line:128] INFO save_path---------./output/patent/deberta_186M_title_abstract_2022-09-08_fold_0 Some weights of the model checkpoint at ./pretrained_models/torch/Erlangshen-DeBERTa-v2-97M-Chinese were not used when initializing PatentDeBertaV2: ['cls.predictions.decoder.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.weight']
This IS expected if you are initializing PatentDeBertaV2 from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
This IS NOT expected if you are initializing PatentDeBertaV2 from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). Some weights of PatentDeBertaV2 were not initialized from the model checkpoint at ./pretrained_models/torch/Erlangshen-DeBERTa-v2-97M-Chinese and are newly initialized: ['classifier.bias', 'classifier.weight'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. tokenization: 766it [00:00, 1286.47it/s] tokenization: 192it [00:00, 1277.11it/s] 2022-09-08 17:23:51,239 train_patent_bert_kfold.py [line:156] INFO Running training 2022-09-08 17:23:51,239 train_patent_bert_kfold.py [line:157] INFO Num examples = 48 2022-09-08 17:23:51,239 train_patent_bert_kfold.py [line:158] INFO Num Epochs = 5 2022-09-08 17:23:51,239 train_patent_bert_kfold.py [line:159] INFO Num batch_size = 16 [evaldation] 12/12 [==============================] 96.2ms/step step: 11.0000 2022-09-08 17:24:05,923 train_patent_bert_kfold.py [line:197] INFO save model 2022-09-08 17:24:07,702 train_patent_bert_kfold.py [line:204] INFO val_macro_f1:0.007106------best_macro_f1:0.007106, loss:2.129214 [evaldation] 12/12 [==============================] 97.4ms/step step: 11.0000 2022-09-08 17:24:21,986 train_patent_bert_kfold.py [line:197] INFO save model 2022-09-08 17:24:23,790 train_patent_bert_kfold.py [line:204] INFO val_macro_f1:0.011858------best_macro_f1:0.011858, loss:1.971897 [evaldation] 12/12 [==============================] 96.5ms/step step: 11.0000 2022-09-08 17:24:38,055 train_patent_bert_kfold.py [line:204] INFO val_macro_f1:0.010802------best_macro_f1:0.011858, loss:0.720681 [evaldation] 12/12 [==============================] 99.7ms/step step: 11.0000 2022-09-08 17:24:52,246 train_patent_bert_kfold.py [line:197] INFO save model 2022-09-08 17:24:53,892 train_patent_bert_kfold.py [line:204] INFO val_macro_f1:0.013702------best_macro_f1:0.013702, loss:0.709787 [evaldation] 12/12 [==============================] 98.7ms/step step: 11.0000 2022-09-08 17:25:08,261 train_patent_bert_kfold.py [line:204] INFO val_macro_f1:0.007966------best_macro_f1:0.013702, loss:0.290335 ` 可以看到每次实验loss都有所差异——sh实验还发现这个差异和输入到模型中的句长有关 sen_length < 100的时候没有差异大于300 400的时候差异明显 fengshenbang_issue.zip

代码在附件中