Currently we are only looking at the first "sequence length" amount of tokens at the beginning of each instance in the data. We should use a "sliding window"-style technique to take full advantage of training examples longer than the sequence length.
Currently we are only looking at the first "sequence length" amount of tokens at the beginning of each instance in the data. We should use a "sliding window"-style technique to take full advantage of training examples longer than the sequence length.