Currently, our model can train with a context of 2 million tokens (at 1B parameters) on a v3-8. However, our demo uses 4096 tokens (characters at the time of writing), a significantly shorter context. Instead of using such an unimpressive context, we could scale up and demonstrate that we can few-shot learn from an entire book.\
This issue tracks the progress of creating and deploying such a model.
Currently, our model can train with a context of 2 million tokens (at 1B parameters) on a v3-8. However, our demo uses 4096 tokens (characters at the time of writing), a significantly shorter context. Instead of using such an unimpressive context, we could scale up and demonstrate that we can few-shot learn from an entire book.\ This issue tracks the progress of creating and deploying such a model.