allenai / longformer

Longformer: The Long-Document Transformer
https://arxiv.org/abs/2004.05150
Apache License 2.0
2.05k stars 276 forks source link

LongformerForSequenceClassification explanation #207

Open Nick9214 opened 3 years ago

Nick9214 commented 3 years ago

Could someone explain to me what exactly this class does? Is it possible to get the classification output without pretraining? (It takes too long on colab GPU. I need something I can run on that)

kyouma commented 3 years ago

This class (and other ...ForSequenceClassification) is used to get label logits (which can be then transformed into probabilities with softmax). You must tokenize your text with a Tokenizer class instance, then pass the input_ids to your model. If you also pass true labels, the model will return the loss value as well. I recommend to try loading a pretrained (on any task) Longformer, and then fine-tune uninitialized layers (and other layers too) on your classification task. Maybe you can fine-tune uninitialized Longformer and get similar accuracy, but I haven't tried this.