Hi! Nice work on the Longformer model! I am learning your model and have got some questions:
I guess the code of LongformerSelfAttention has never enabled the autoregressive mode? Since I noticed that the autoregressive parameter is always set False when calling diagonaled_mm_tvm. If it is a bug, could you please fix it asap?
Does this code support relative position embedding, as you mentioned in the paper that rpe is used in the autoregressive LM. If not, could you please release this part of code?
Hi! Nice work on the Longformer model! I am learning your model and have got some questions:
autoregressive
mode? Since I noticed that theautoregressive
parameter is always setFalse
when callingdiagonaled_mm_tvm
. If it is a bug, could you please fix it asap?Thank you~