google-research / bigbird

Transformers for Longer Sequences
https://arxiv.org/abs/2007.14062
Apache License 2.0
563 stars 101 forks source link

Are encoder and decoder both implemented with sparse attention? How long is the verified output length for the decoder? #30

Open dongxinghua opened 2 years ago