allenai / longformer

Longformer: The Long-Document Transformer
https://arxiv.org/abs/2004.05150
Apache License 2.0
2.04k stars 274 forks source link

Does Longformer predict the answer span on WikiHop dataset? #93

Closed sjy1203 closed 4 years ago

sjy1203 commented 4 years ago

Hi, In your paper, it said

For WikiHop and TriviaQA we follow the simple QA model of BERT (Devlin et al., 2019), and concatenate question and documents into one long sequence

WikiHop uses a classification layer for the candidate while TriviaQA uses the loss function of Clark and Gardner (2017) to predict answer span.

Does it mean Longformer predict answer span on Wikihop same as TriviaQA?

ibeltagy commented 4 years ago

Wikihop questions are multiple-choice questions so it is a multiclass classification problem not a span prediction task.

sjy1203 commented 4 years ago

Thanks for your quick reply.

I see, so how does Longformer do the multiclass classification problem such as Wikihop?

It's not very clear by saying

WikiHop uses a classification layer for the candidate

Does Longformer do multiple 0/1 classifications by concatenating the [CLS] output of the question and documents with each candidate? Or ...?

matt-peters commented 4 years ago

We encode the question and each candidate answer choice as [q] question [/q] [ent] candidate1 [/ent] ... [ent] candidateN [/ent]. Then attach a linear layer with single output score (R^1024 -> R^1) to each [ent] token, concat all scores for all candidates, apply softmax and use cross entropy loss with the correct candidate. More details in appendix B (https://arxiv.org/pdf/2004.05150.pdf).

sjy1203 commented 4 years ago

Thanks

ibeltagy commented 4 years ago

Closing. Please feel free to reopen or create a new issue if you have other questions.