google-research / bigbird

Transformers for Longer Sequences
https://arxiv.org/abs/2007.14062
Apache License 2.0
563 stars 101 forks source link

Why ``last_idx`` set to 1024 even when sequence length goes upto 4096? #18

Open Jeevesh8 opened 3 years ago

Jeevesh8 commented 3 years ago

I wonder why the last_idx(the last index upto which blocks are chosen from for random attention) variable here has been set to 1024 even when the sequence length increases to 4096? Is this an error, or am I getting something wrong?

Thank you for your precious time. Yours gratefully