exo-explore / exo

Run your own AI cluster at home with everyday devices 📱💻 🖥️⌚
GNU General Public License v3.0
6.56k stars 342 forks source link

Wrong model referenced for Lllama-3.1 70B for tinygrad inference engine? #191

Closed barsuna closed 2 weeks ago

barsuna commented 2 weeks ago

When selecting llama 3.1 70B model in tinychat (with tinygrad inference engine) it maps to

NousResearch/Meta-Llama-3.1-70B

which seems to be non-chat model. Evidence for this is the following error that comes up when trying to use this model

No chat template is set for this tokenizer, falling back to a default class-level template. This is very error-prone, because models are often trained with templates different from the class default! Default chat templates are a legacy feature and will be removed in Transformers v4.43, at which point any code depending on them will stop working. We recommend setting a valid chat template before then to ensure that this model continues working without issues.

In addition it is seen that the model is clearly confused about special tokens, above is not just a cosmetic message.

It seems the correct model to use is

NousResearch/Meta-Llama-3.1-70B-Instruct

Verified it and it produces better conversation responses. To be clear - can't say if it is the best llama 3.1 70B out there, just that it doesn't have chat-template / tokenization issues that default 70B model has.

AlexCheema commented 2 weeks ago

Thank you! Fixed https://github.com/exo-explore/exo/commit/dc3b2bde39a5d9b59806bc5970a8fb7fe51b2c75