MichiganDataScienceTeam / F24-mini-copilot

Building and deploying a lightweight code autocompletion tool, from GPT-2 weights to a working VSCode extension.
MIT License
9 stars 0 forks source link

chunking the input string of the dataset into chunks with a fixed length and fixed number of chunks #24

Closed michaeljcliao closed 3 weeks ago

michaeljcliao commented 4 weeks ago

this function takes in a string of tokens and chunks it into chunks with fixed length. If the resulting chunks exceeds a certain number, only the first fixed number of chunks will be the output. There are also a fixed number of tokens overlapping between two chunks.