NVIDIA / RULER

This repo contains the source code for RULER: What’s the Real Context Size of Your Long-Context Language Models?
Apache License 2.0
738 stars 47 forks source link

Note about Mistral models #67

Open inflatebot opened 2 months ago

inflatebot commented 2 months ago

IDK if you guys have been using mistral-common to test Mistral's models, but if you haven't, there's a chance you haven't been forming your requests properly. The templates used by a lot of tooling have been subtly broken for a long time. It'd be worth checking the new document from Mistral's Cookbook and possibly reimplementing the tokenizer/templates used for those tests if they need it, and redoing the tests.

(We're all having a collective panic attack about this in the RP scene right now, because it means a huge chunk of our finetunes and merges are probably also broken!)