PaddlePaddle / PaddleNLP

👑 Easy-to-use and powerful NLP and LLM library with 🤗 Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including 🗂Text Classification, 🔍 Neural Search, ❓ Question Answering, ℹ️ Information Extraction, 📄 Document Intelligence, 💌 Sentiment Analysis etc.
https://paddlenlp.readthedocs.io
Apache License 2.0
12.09k stars 2.93k forks source link

paddle2.3复现SBert训练结果不可复现,并且性能相比较torch大为下降 #2989

Closed HUSTHY closed 1 year ago

HUSTHY commented 2 years ago

使用paddle2.3.1.post112/cuda11.2/3090显卡/paddlenlp2.3.4/Linux/python3.7环境复现SBert的时候,固定随机种子 paddle.seed(100) random.seed(100) np.random.seed(100) 并且使用FLAGS_cudnn_deterministic = True结果仍然不能复现,每次训练的acc指标都不能loss也不同; 同torch版本对比,在同一个数据集上(paws_x),同一个Bert预训练权重 bert-wwm-ext-chinese下: torch则可以复现,并且torch的acc为0.75(torch不设置随机种子的时候结果也是瞎飞),paddle的acc0.55-0.62之间随机出现; 请问是环境版本问题还是paddle框架问题还是我代码问题?怎么解决?

w5688414 commented 2 years ago

使用paddle2.3.1.post112/cuda11.2/3090显卡/paddlenlp2.3.4/Linux/python3.7环境复现SBert的时候,固定随机种子 paddle.seed(100) random.seed(100) np.random.seed(100) 并且使用FLAGS_cudnn_deterministic = True结果仍然不能复现,每次训练的acc指标都不能loss也不同; 同torch版本对比,在同一个数据集上(paws_x),同一个Bert预训练权重 bert-wwm-ext-chinese下: torch则可以复现,并且torch的acc为0.75(torch不设置随机种子的时候结果也是瞎飞),paddle的acc0.55-0.62之间随机出现; 请问是环境版本问题还是paddle框架问题还是我代码问题?怎么解决?

bert里面有dropout,dataloader的shuffle等随机因素,可以把dropout这些随机因素关掉,然后可以看看能不能达到您的目的

HUSTHY commented 2 years ago

全部拍查过了;原因是梯度回传的时候梯度每次不一样;关闭优化器结果就可以复现; 官方人员说是paddle某些op具有随机性,导致梯度回传更新参数不一样

w5688414 commented 2 years ago

全部拍查过了;原因是梯度回传的时候梯度每次不一样;关闭优化器结果就可以复现; 官方人员说是paddle某些op具有随机性,导致梯度回传更新参数不一样

dropout,dataloader,batch_sampler这些少数有随机性,其他的都没啥问题,优化器我做过实验,应该是没多大问题的,可以给出代码示例,我们看一下

HUSTHY commented 2 years ago

我把完整的代码都上传到附件paddle_first_demo.zip上了 paddle_first_demo.zip

w5688414 commented 2 years ago

我把完整的代码都上传到附件paddle_first_demo.zip上了 paddle_first_demo.zip

hi,回复有点晚了。您可以把下面的参数设置成0。

                 hidden_dropout_prob=0.1,
                 attention_probs_dropout_prob=0.1,

https://github.com/PaddlePaddle/PaddleNLP/blob/develop/examples/text_matching/simcse/train.py

做法请参考这个

HUSTHY commented 2 years ago

谢谢 回复 貌似 这个操作我试过了的 就是把bert config文件中的一些dropout设置参数设置为0相当于把bert结构中的dropout和attention模块中的dropout都关闭了;貌似没有效果

w5688414 commented 2 years ago

cpu呢?

谢谢 回复 貌似 这个操作我试过了的 就是把bert config文件中的一些dropout设置参数设置为0相当于把bert结构中的dropout和attention模块中的dropout都关闭了;貌似没有效果

好的,那看来还是有随机性,试试cpu,看看cpu能否对齐?

HUSTHY commented 2 years ago

不过我也不确认生效没有 我再去确认一下

HUSTHY commented 2 years ago

cpu 上是没有随机性的 可以确认

HUSTHY commented 2 years ago

还有一个问题 请教一下 如果使用paddle去实现 SIMCSE 算法 bert config hidden_dropout_prob 和 attention_probs_dropout_prob 一定不能为0,那有什么方法可以解决呢?

w5688414 commented 2 years ago

如果dropout不为0,那我理解应该是不能固定住,dropout本来就会引入随机性

HUSTHY commented 2 years ago

OK 确认了 还是不行 GPU上跑起来效果还是不行 第一次acc=0.5605 第二次acc=0.5655 还是不能复现出一样的结果

总之谢谢你的回复

w5688414 commented 2 years ago

@HUSTHY

第一次运行的结果:

grep: warning: GREP_OPTIONS is deprecated; please use an alias or script
2022-08-22 06:47:49,062 train_sentence_bert.py [line:93] INFO args: Namespace(batch_size=64, epochs=1, lr=1e-05, max_len=64, model_out='./output', pretrained='pretrained_models/paddle/bert-wwm-ext-chinese', task_type='classification', train_file='./data/paws_x/translated_train.tsv', val_file='./data/paws_x/dev_2k.tsv')
[2022-08-22 06:47:49,063] [    INFO] - Already cached /root/.paddlenlp/models/bert-wwm-ext-chinese/bert-wwm-ext-chinese-vocab.txt
[2022-08-22 06:47:49,075] [    INFO] - tokenizer config file saved in /root/.paddlenlp/models/bert-wwm-ext-chinese/tokenizer_config.json
[2022-08-22 06:47:49,076] [    INFO] - Special tokens file saved in /root/.paddlenlp/models/bert-wwm-ext-chinese/special_tokens_map.json
[2022-08-22 06:47:49,076] [    INFO] - Already cached /root/.paddlenlp/models/bert-wwm-ext-chinese/bert-wwm-ext-chinese.pdparams
W0822 06:47:49.078204 79030 gpu_context.cc:278] Please NOTE: device: 1, GPU Compute Capability: 7.0, Driver API Version: 10.2, Runtime API Version: 10.2
W0822 06:47:49.083241 79030 gpu_context.cc:306] device: 1, cuDNN Version: 7.6.
tokenization: 1000it [00:01, 923.86it/s]
tokenization: 1000it [00:01, 936.03it/s]
Epoch 0: CosineAnnealingDecay set learning rate to 1e-05.
2022-08-22 06:48:01,389 train_sentence_bert.py [line:113] INFO ***** Running training *****
2022-08-22 06:48:01,390 train_sentence_bert.py [line:114] INFO   Num examples = 16
2022-08-22 06:48:01,390 train_sentence_bert.py [line:115] INFO   Num Epochs = 1
[evaldation] 16/16 [==============================] 131.3ms/step  loss: 0.6809 Epoch 1: CosineAnnealingDecay set learning rate to 9.755282581475769e-06.
2022-08-22 06:48:10,099 train_sentence_bert.py [line:152] INFO save model
[2022-08-22 06:48:12,171] [    INFO] - tokenizer config file saved in ./output/paddle_2022-08-22/tokenizer_config.json
[2022-08-22 06:48:12,172] [    INFO] - Special tokens file saved in ./output/paddle_2022-08-22/special_tokens_map.json
2022-08-22 06:48:12,172 train_sentence_bert.py [line:158] INFO val_acc:0.5590------best_acc:0.5590

第二次运行的结果:

grep: warning: GREP_OPTIONS is deprecated; please use an alias or script
2022-08-22 06:49:05,761 train_sentence_bert.py [line:93] INFO args: Namespace(batch_size=64, epochs=1, lr=1e-05, max_len=64, model_out='./output', pretrained='pretrained_models/paddle/bert-wwm-ext-chinese', task_type='classification', train_file='./data/paws_x/translated_train.tsv', val_file='./data/paws_x/dev_2k.tsv')
[2022-08-22 06:49:05,761] [    INFO] - Already cached /root/.paddlenlp/models/bert-wwm-ext-chinese/bert-wwm-ext-chinese-vocab.txt
[2022-08-22 06:49:05,774] [    INFO] - tokenizer config file saved in /root/.paddlenlp/models/bert-wwm-ext-chinese/tokenizer_config.json
[2022-08-22 06:49:05,775] [    INFO] - Special tokens file saved in /root/.paddlenlp/models/bert-wwm-ext-chinese/special_tokens_map.json
[2022-08-22 06:49:05,775] [    INFO] - Already cached /root/.paddlenlp/models/bert-wwm-ext-chinese/bert-wwm-ext-chinese.pdparams
W0822 06:49:05.777117 79056 gpu_context.cc:278] Please NOTE: device: 1, GPU Compute Capability: 7.0, Driver API Version: 10.2, Runtime API Version: 10.2
W0822 06:49:05.782191 79056 gpu_context.cc:306] device: 1, cuDNN Version: 7.6.
tokenization: 1000it [00:01, 869.93it/s]
tokenization: 1000it [00:01, 929.98it/s]
Epoch 0: CosineAnnealingDecay set learning rate to 1e-05.
2022-08-22 06:49:18,445 train_sentence_bert.py [line:113] INFO ***** Running training *****
2022-08-22 06:49:18,445 train_sentence_bert.py [line:114] INFO   Num examples = 16
2022-08-22 06:49:18,445 train_sentence_bert.py [line:115] INFO   Num Epochs = 1
[evaldation] 16/16 [==============================] 131.1ms/step  loss: 0.6809 Epoch 1: CosineAnnealingDecay set learning rate to 9.755282581475769e-06.
2022-08-22 06:49:27,084 train_sentence_bert.py [line:152] INFO save model
[2022-08-22 06:49:29,258] [    INFO] - tokenizer config file saved in ./output/paddle_2022-08-22/tokenizer_config.json
[2022-08-22 06:49:29,259] [    INFO] - Special tokens file saved in ./output/paddle_2022-08-22/special_tokens_map.json
2022-08-22 06:49:29,259 train_sentence_bert.py [line:158] INFO val_acc:0.5590------best_acc:0.5590

能够复现,我的环境是:

HUSTHY commented 2 years ago

这是我用全部的数据,不是1000条 第一次训练结果 2022-08-22 12:10:11,619 train_sentence_bert.py [line:91] INFO args: Namespace(batch_size=64, epochs=5, lr=1e-05, max_len=64, model_out='./output', pretrained='pretrained_models/paddle/bert-wwm-ext-chinese', task_type='classification', train_file='./data/paws_x/translated_train.tsv', val_file='./data/paws_x/dev_2k.tsv') 2022-08-22 12:11:28,773 train_sentence_bert.py [line:111] INFO Running training 2022-08-22 12:11:28,774 train_sentence_bert.py [line:112] INFO Num examples = 768 2022-08-22 12:11:28,775 train_sentence_bert.py [line:113] INFO Num Epochs = 5 2022-08-22 12:14:44,734 train_sentence_bert.py [line:150] INFO save model 2022-08-22 12:14:47,777 train_sentence_bert.py [line:156] INFO val_acc:0.5605------best_acc:0.5605 2022-08-22 12:18:00,986 train_sentence_bert.py [line:156] INFO val_acc:0.5605------best_acc:0.5605 2022-08-22 12:21:15,839 train_sentence_bert.py [line:156] INFO val_acc:0.5550------best_acc:0.5605 2022-08-22 12:24:32,991 train_sentence_bert.py [line:156] INFO val_acc:0.5460------best_acc:0.5605 2022-08-22 12:27:50,582 train_sentence_bert.py [line:156] INFO val_acc:0.5445------best_acc:0.5605

第二次训练结果 2022-08-22 12:45:39,841 train_sentence_bert.py [line:91] INFO args: Namespace(batch_size=64, epochs=5, lr=1e-05, max_len=64, model_out='./output', pretrained='pretrained_models/paddle/bert-wwm-ext-chinese', task_type='classification', train_file='./data/paws_x/translated_train.tsv', val_file='./data/paws_x/dev_2k.tsv') 2022-08-22 12:46:44,956 train_sentence_bert.py [line:111] INFO Running training 2022-08-22 12:46:44,956 train_sentence_bert.py [line:112] INFO Num examples = 768 2022-08-22 12:46:44,957 train_sentence_bert.py [line:113] INFO Num Epochs = 5 2022-08-22 12:49:47,558 train_sentence_bert.py [line:150] INFO save model 2022-08-22 12:49:50,183 train_sentence_bert.py [line:156] INFO val_acc:0.5655------best_acc:0.5655 2022-08-22 12:53:05,770 train_sentence_bert.py [line:156] INFO val_acc:0.5635------best_acc:0.5655 2022-08-22 12:56:29,093 train_sentence_bert.py [line:156] INFO val_acc:0.5525------best_acc:0.5655 2022-08-22 12:59:52,874 train_sentence_bert.py [line:156] INFO val_acc:0.5555------best_acc:0.5655 2022-08-22 13:03:15,798 train_sentence_bert.py [line:156] INFO val_acc:0.5540------best_acc:0.5655

HUSTHY commented 2 years ago

这是我用1000条数据训练对比 第一次训练结果 2022-08-22 14:59:04,602 train_sentence_bert.py [line:91] INFO args: Namespace(batch_size=64, epochs=5, lr=1e-05, max_len=64, model_out='./output', pretrained='pretrained_models/paddle/bert-wwm-ext-chinese', task_type='classification', train_file='./data/paws_x/translated_train.tsv', val_file='./data/paws_x/dev_2k.tsv') W0822 14:59:04.653079 74201 gpu_resources.cc:61] Please NOTE: device: 1, GPU Compute Capability: 8.6, Driver API Version: 11.2, Runtime API Version: 11.2 W0822 14:59:04.681924 74201 gpu_resources.cc:91] device: 1, cuDNN Version: 8.1. tokenization: 1000it [00:01, 828.41it/s] tokenization: 1000it [00:01, 866.82it/s] Epoch 0: CosineAnnealingDecay set learning rate to 1e-05. 2022-08-22 14:59:13,117 train_sentence_bert.py [line:111] INFO Running training 2022-08-22 14:59:13,117 train_sentence_bert.py [line:112] INFO Num examples = 16 2022-08-22 14:59:13,118 train_sentence_bert.py [line:113] INFO Num Epochs = 5 [evaldation] 16/16 [==============================] 76.7ms/step loss: 0.6914 Epoch 1: CosineAnnealingDecay set learning rate to 9.755282581475769e-06. 2022-08-22 14:59:20,583 train_sentence_bert.py [line:150] INFO save model [2022-08-22 14:59:23,324] [ INFO] - tokenizer config file saved in ./output/paddle_2022-08-22/tokenizer_config.json [2022-08-22 14:59:23,324] [ INFO] - Special tokens file saved in ./output/paddle_2022-08-22/special_tokens_map.json 2022-08-22 14:59:23,325 train_sentence_bert.py [line:156] INFO val_acc:0.5800------best_acc:0.5800 [evaldation] 16/16 [==============================] 77.6ms/step loss: 0.6702 Epoch 2: CosineAnnealingDecay set learning rate to 9.045084971874738e-06. 2022-08-22 14:59:27,950 train_sentence_bert.py [line:156] INFO val_acc:0.5790------best_acc:0.5800 [evaldation] 16/16 [==============================] 77.8ms/step loss: 0.6428 Epoch 3: CosineAnnealingDecay set learning rate to 7.938926261462366e-06. 2022-08-22 14:59:32,587 train_sentence_bert.py [line:156] INFO val_acc:0.5640------best_acc:0.5800 [evaldation] 16/16 [==============================] 79.0ms/step loss: 0.6212 Epoch 4: CosineAnnealingDecay set learning rate to 6.545084971874738e-06. 2022-08-22 14:59:37,367 train_sentence_bert.py [line:156] INFO val_acc:0.5510------best_acc:0.5800 [evaldation] 16/16 [==============================] 79.2ms/step loss: 0.6161 Epoch 5: CosineAnnealingDecay set learning rate to 5e-06. 2022-08-22 14:59:42,155 train_sentence_bert.py [line:156] INFO val_acc:0.5540------best_acc:0.5800

第二次训练结果 2022-08-22 15:00:54,722 train_sentence_bert.py [line:91] INFO args: Namespace(batch_size=64, epochs=5, lr=1e-05, max_len=64, model_out='./output', pretrained='pretrained_models/paddle/bert-wwm-ext-chinese', task_type='classification', train_file='./data/paws_x/translated_train.tsv', val_file='./data/paws_x/dev_2k.tsv') W0822 15:00:54.733444 75341 gpu_resources.cc:61] Please NOTE: device: 1, GPU Compute Capability: 8.6, Driver API Version: 11.2, Runtime API Version: 11.2 W0822 15:00:54.736241 75341 gpu_resources.cc:91] device: 1, cuDNN Version: 8.1. tokenization: 1000it [00:01, 945.41it/s] tokenization: 1000it [00:01, 935.15it/s] Epoch 0: CosineAnnealingDecay set learning rate to 1e-05. 2022-08-22 15:00:59,736 train_sentence_bert.py [line:111] INFO Running training 2022-08-22 15:00:59,736 train_sentence_bert.py [line:112] INFO Num examples = 16 2022-08-22 15:00:59,736 train_sentence_bert.py [line:113] INFO Num Epochs = 5 [evaldation] 16/16 [==============================] 76.9ms/step loss: 0.6915 Epoch 1: CosineAnnealingDecay set learning rate to 9.755282581475769e-06. 2022-08-22 15:01:06,179 train_sentence_bert.py [line:150] INFO save model [2022-08-22 15:01:08,908] [ INFO] - tokenizer config file saved in ./output/paddle_2022-08-22/tokenizer_config.json [2022-08-22 15:01:08,908] [ INFO] - Special tokens file saved in ./output/paddle_2022-08-22/special_tokens_map.json 2022-08-22 15:01:08,909 train_sentence_bert.py [line:156] INFO val_acc:0.5800------best_acc:0.5800 [evaldation] 16/16 [==============================] 75.8ms/step loss: 0.6702 Epoch 2: CosineAnnealingDecay set learning rate to 9.045084971874738e-06. 2022-08-22 15:01:13,462 train_sentence_bert.py [line:156] INFO val_acc:0.5790------best_acc:0.5800 [evaldation] 16/16 [==============================] 81.3ms/step loss: 0.6429 Epoch 3: CosineAnnealingDecay set learning rate to 7.938926261462366e-06. 2022-08-22 15:01:18,329 train_sentence_bert.py [line:156] INFO val_acc:0.5630------best_acc:0.5800 [evaldation] 16/16 [==============================] 77.8ms/step loss: 0.6211 Epoch 4: CosineAnnealingDecay set learning rate to 6.545084971874738e-06. 2022-08-22 15:01:23,064 train_sentence_bert.py [line:156] INFO val_acc:0.5510------best_acc:0.5800 [evaldation] 16/16 [==============================] 80.3ms/step loss: 0.6161 Epoch 5: CosineAnnealingDecay set learning rate to 5e-06. 2022-08-22 15:01:27,962 train_sentence_bert.py [line:156] INFO val_acc:0.5550------best_acc:0.5800

HUSTHY commented 2 years ago

1000条数据的时候 第一个epoch的结果是一样的;全部数据集的时候不一样; 环境和你的环境是一样的

w5688414 commented 2 years ago

1000条数据的时候 第一个epoch的结果是一样的;全部数据集的时候不一样; 环境和你的环境是一样的

好的,确实存在这个问题,我复现出来了,感谢您的反馈,我向框架的人反馈一下

HUSTHY commented 2 years ago

是框架的问题?还是paddlenlp实现的Bert的问题呀? 之前有人说是梯度回传的问题。。。不知道是不是

w5688414 commented 2 years ago

可能是transormer的问题,这里面第8章在所有种子固定的情况下,能够完全复现。但最后的transformer有一点点diff, https://aistudio.baidu.com/aistudio/course/introduce/25793

HUSTHY commented 2 years ago

OK 好的 谢谢了 期待后面能解决这个问题

github-actions[bot] commented 1 year ago

This issue is stale because it has been open for 60 days with no activity. 当前issue 60天内无活动,被标记为stale。

HUSTHY commented 1 year ago

已收到!谢谢!                          ——黄洋

ZzyChris97 commented 1 year ago

请问一下,最后这个问题解决了吗?

HUSTHY commented 1 year ago

已收到!谢谢!                          ——黄洋

github-actions[bot] commented 1 year ago

This issue is stale because it has been open for 60 days with no activity. 当前issue 60天内无活动,被标记为stale。

github-actions[bot] commented 1 year ago

This issue was closed because it has been inactive for 14 days since being marked as stale. 当前issue 被标记为stale已有14天,即将关闭。