allenai / longformer

Longformer: The Long-Document Transformer
https://arxiv.org/abs/2004.05150
Apache License 2.0
2.04k stars 273 forks source link

Longformer Encoder/Decoder model pretraining query #132

Open dwlmt opened 3 years ago

dwlmt commented 3 years ago

I'm really interested in the work you've done to adapt the longformer to an encoder/decode architecture. I'd like to use it in some work I'm doing. My main motive is the memory reductions of both the linear scaling of the longformer and the built-in gradient checkpointing in your version as I'm constrained to using 12GB GPUs which make using the standard BART large difficult with a reasonable batch size.

The questions is are the listed models i.e. https://ai2-s2-research.s3-us-west-2.amazonaws.com/longformer/longformer-encdec-base-16384.tar.gz pretrained for summarisation or just converted from the original BART models? My use case is general text generation so would want to use the original BART weights. Just thought I'd check as I see you have a conversion script as well which I can use if they are for summarisation.

ibeltagy commented 3 years ago

It is the original BART.