Add Sliding Windows To Dataset Pre-Processing

DRAGNLabs / 301r_retnet

2 stars 1 forks source link

Add Sliding Windows To Dataset Pre-Processing #7

Closed nprisbrey closed 5 months ago

nprisbrey commented 6 months ago

Currently we are only looking at the first "sequence length" amount of tokens at the beginning of each instance in the data. We should use a "sliding window"-style technique to take full advantage of training examples longer than the sequence length.

nprisbrey commented 5 months ago

Closing because this seems undesirable (and might cause data leakage after shuffling) if implemented.