预训练载入方式咨询

Ethan-yt / guwenbert

GuwenBERT: 古文预训练语言模型（古文BERT） A Pre-trained Language Model for Classical Chinese (Literary Chinese)

Apache License 2.0

488 stars 40 forks source link

预训练载入方式咨询 #14

Closed renjunxiang closed 3 years ago

renjunxiang commented 3 years ago

非常感谢您开源这个项目，在使用过程中遇到了一点小问题想咨询一下。中文roberta一般都是BertModel载入，例如https://github.com/ymcui/Chinese-BERT-wwm，在config中model_type是bert。您这边是用RobertaModel，想问一下为什么这样处理呢？

Ethan-yt commented 3 years ago

这是因为预训练使用的模型不同。由于BERT和RoBERTa的模型结构类似，所以他们的RoBERTA还是基于BERT模型训练的。而我们的训练方法是，直接使用RoBERTa模型训练，所以载入时也需要用RobertaModel。

renjunxiang commented 3 years ago

这是因为预训练使用的模型不同。由于BERT和RoBERTa的模型结构类似，所以他们的RoBERTA还是基于BERT模型训练的。而我们的训练方法是，直接使用RoBERTa模型训练，所以载入时也需要用RobertaModel。

感谢您的回答。roberta和bert的差别应该只是在预训练过程中去掉了nsp和动态mask吧，模型结构应该是一样的。如果是用HuggingFace脚本训练的，官方是已经去掉的NSP且MLM是动态的。我刚重新看了结构，区别只在token_type_embeddings，bert2个、roberta1个。因为token_type_embeddings在QA过程中的效果目前尚不明确，所以我觉得可以考虑用BertModel，否则在QA类型的任务会无法对query额外编码。以上个人建议供参考哈~

Ethan-yt commented 3 years ago

@CZWin32768

CZWin32768 commented 3 years ago

Thanks for your interest @renjunxiang . The type embedding is not a necessary component in pre-trained LMs (see the RoBERTa paper). The huggingface implementation reserves the type embedding just for compatibility. You can directly concatenate the passage-query pair without type embedding for the QA task.

renjunxiang commented 3 years ago

Thanks for your interest @renjunxiang . The type embedding is not a necessary component in pre-trained LMs (see the RoBERTa paper). The huggingface implementation reserves the type embedding just for compatibility. You can directly concatenate the passage-query pair without type embedding for the QA task.

Thank you for your answer！