Hi Meta team, we just added MobileLLM support to GPTQModel for 4bit gptq quantization. However, we are running into problem where we are trying to establish (recreate) baseline native/bf16 benchmark values using lm-eval. lm-eval is able to run but score is horrendous relative to other 1B models even those from Llama 3.2 1B. We suspect there is an issue with tokenizer?
These two issues are related we suspect as the repeating output is causing degradation of lm-eval benchmarks.
Name: transformers
Version: 4.46.1
What is the correct way to generate output with MobileLLM and configure the tokenizer? Can you guys provide a few sample input/ouput so we can verify if this model code, config, or tokenizer related?
Even better, which tool is meta using to generate the benchmark results? As of now, lm-eval results are pitting this model well below Llama 3.2 1B on multiple fronts.
Hi Meta team, we just added MobileLLM support to GPTQModel for 4bit gptq quantization. However, we are running into problem where we are trying to establish (recreate) baseline native/bf16 benchmark values using
lm-eval
.lm-eval
is able to run but score is horrendous relative to other 1B models even those from Llama 3.2 1B. We suspect there is an issue with tokenizer?We are also running into problem using pure hf transformers to run the model and the output is just repeating repsonse as if
EOS
is never generated:These two issues are related we suspect as the repeating output is causing degradation of
lm-eval
benchmarks.What is the correct way to generate output with MobileLLM and configure the tokenizer? Can you guys provide a few sample input/ouput so we can verify if this model code, config, or tokenizer related?
Even better, which tool is meta using to generate the benchmark results? As of now,
lm-eval
results are pitting this model well below Llama 3.2 1B on multiple fronts.@liuzechun