ECP-CANDLE / Foundation

MIT License
1 stars 0 forks source link

Implement loading models from huggingface; initially only for pre-training #1

Closed azton closed 1 year ago

azton commented 1 year ago

nanoGPT is great, but it would be nice to experiment with different variations from huggingface.

azton commented 1 year ago

Implemented GPT2 and GPTNeoX. Turns out something was going wrong in nanoGPT and these actually train in a meaningful way. (I probably messed up the changes to make FSDP work.)