Max token size? - Githubissues

deepset-ai / FARM

:house_with_garden: Fast & easy transfer learning for NLP. Harvesting language models for the industry. Focus on Question Answering.

https://farm.deepset.ai

Apache License 2.0

1.74k stars 247 forks source link

Max token size? #854

Closed kishb87 closed 2 years ago

kishb87 commented 2 years ago

Question I am currently using the Huggingface deepset/roberta-base-squad2 model for Question-Answering. From what I understand, RoBERTa has a max token length of 512 yet I am able to provide larger documents as context.

Can you provide some insight into how this model is able to do this? Does the model chunk the context into multiple tokens or is this a variation of RoBERTa that allows for longer token lengths?

Timoeller commented 2 years ago

Hey sure, thanks for reaching out. We recently moved all our QA functionality to https://github.com/deepset-ai/haystack, so please ask future questions there.

About your question: yes, roberta has a 512 token lengths limit but QA methods chunk text into smaller pieces and aggregate the answers over these multiple chunks (or even across multiple documents).

Hope that helps?

kishb87 commented 2 years ago

It does. Thanks for the help. Will definitely ask future questions in the other thread.