bigcode-project / octopack

🐙 OctoPack: Instruction Tuning Code Large Language Models
https://arxiv.org/abs/2308.07124
MIT License
431 stars 27 forks source link

Create filter_v2.py #3

Closed Muennighoff closed 1 year ago

Muennighoff commented 1 year ago

Based on https://github.com/bigcode-project/commits/pull/1 Main difference is different loading code (for if the dataset is already downloaded locally) & using start / end words