Does anyone have experience with running the model when there's more than 510 tokens?
Is the best way to chunk the text and then run it twice with the same questions (perhaps with a stride)?
Also, any idea how to run it with multiple questions at once?
Here are some ideas to increase the sequence length, which may be helpful for tasks that require longer input sequences
Some ideas
Let's say, we want to increase the max sequence length to 1024, twice of the original pre-trained sequence length. Duplicating the position embedding values from the range of 0-511 to the range of 512-1023 can be a simple and effective idea. In this way, the second set of 512 position embeddings will share the same weight values as the first set of 512 position embeddings.
Use RoPE (I actually tried this on a token classification task, but it greatly harms the performance) or ALiBi, related issue here
window slide;
Here is an example. Suppose our training sequence length is 7 and our predicted length is 8, we can shift the available sequence across a predicted length, as shown in this image.
Hi,
Does anyone have experience with running the model when there's more than 510 tokens? Is the best way to chunk the text and then run it twice with the same questions (perhaps with a stride)?
Also, any idea how to run it with multiple questions at once?