MichiganDataScienceTeam / F24-mini-copilot

Building and deploying a lightweight code autocompletion tool, from GPT-2 weights to a working VSCode extension.
MIT License
9 stars 0 forks source link

Merge dataset into dev #31

Closed USSiamaboat closed 2 weeks ago

USSiamaboat commented 2 weeks ago

Force pushed to undo bad merge

USSiamaboat commented 2 weeks ago

New dataset.py breaks the code in train.py, which relies on the old dataset.py, but the code in train.py doesn't run anyways

danaiamirali commented 2 weeks ago

Also tokenizer_10m/ should not be getting pushed to git. Maybe a tarball of that folder could, but the actual folder should not be getting pushed

USSiamaboat commented 2 weeks ago

Also tokenizer_10m/ should not be getting pushed to git. Maybe a tarball of that folder could, but the actual folder should not be getting pushed

Implemented the tar thing for now. Where would the files go if we don't put it in the github?

danaiamirali commented 2 weeks ago

Typically would use AWS S3 for something like this but we can use Drive for our project. Gave you access to a data folder in the Mini Copilot drive folder