google-research / bigbird

Transformers for Longer Sequences
https://arxiv.org/abs/2007.14062
Apache License 2.0
563 stars 101 forks source link

detail about warm start from RoBERTa’s checkpoint. #16

Open RyanHuangNLP opened 3 years ago

RyanHuangNLP commented 3 years ago

how to use the pretrain RoBERTa’s checkpoint, I was doubt that whether use the pretrain position embedding in Roberta