Longformer Encoder/Decoder model pretraining query

I'm really interested in the work you've done to adapt the longformer to an encoder/decode architecture. I'd like to use it in some work I'm doing. My main motive is the memory reductions of both the linear scaling of the longformer and the built-in gradient checkpointing in your version as I'm constrained to using 12GB GPUs which make using the standard BART large difficult with a reasonable batch size.

The questions is are the listed models i.e. https://ai2-s2-research.s3-us-west-2.amazonaws.com/longformer/longformer-encdec-base-16384.tar.gz pretrained for summarisation or just converted from the original BART models? My use case is general text generation so would want to use the original BART weights. Just thought I'd check as I see you have a conversion script as well which I can use if they are for summarisation.

allenai / longformer

Longformer Encoder/Decoder model pretraining query #132