HomebrewNLP / Olmax

HomebrewNLP in JAX flavour for maintable TPU-Training
BSD 2-Clause "Simplified" License
45 stars 6 forks source link

Long-Context Experiments #29

Open ClashLuke opened 2 years ago

ClashLuke commented 2 years ago

Currently, our model can train with a context of 2 million tokens (at 1B parameters) on a v3-8. However, our demo uses 4096 tokens (characters at the time of writing), a significantly shorter context. Instead of using such an unimpressive context, we could scale up and demonstrate that we can few-shot learn from an entire book.\ This issue tracks the progress of creating and deploying such a model.