allenai / longformer

Longformer: The Long-Document Transformer
https://arxiv.org/abs/2004.05150
Apache License 2.0
2k stars 268 forks source link

Difference between this codebase and Huggingface? #210

Open aleSuglia opened 2 years ago

aleSuglia commented 2 years ago

Hi @ibeltagy,

I was planning to use Longformer as a backbone architecture for a different domain than NLP. I was planning to train from scratch using a different type of data. I am using the Huggingface version of the model that looks like has been created by yourself. However, I was wondering whether there is any concrete benefit from using this version instead of the HF one?

The only relevant information about this is reported in the HF documentation:

The self-attention module :obj:`LongformerSelfAttention` implemented here supports the combination of local and
    global attention but it lacks support for autoregressive attention and dilated attention. Autoregressive and
    dilated attention are more relevant for autoregressive language modeling than finetuning on downstream tasks.
    Future release will add support for autoregressive attention, but the support for dilated attention requires a
    custom CUDA kernel to be memory and compute efficient.