Open duanhx1037 opened 3 months ago
Hi! What's going on here is that you're trying to use a metric acc
that isn't set up to work with the loglikelihood_rolling
metric. It's on our todo list to make metrics more clear and to prevent silent failures made in cases like this.
When you insert acc
, what do you expect to be computed? Token-level accuracy as a % ?
Hi! What's going on here is that you're trying to use a metric
acc
that isn't set up to work with theloglikelihood_rolling
metric. It's on our todo list to make metrics more clear and to prevent silent failures made in cases like this.When you insert
acc
, what do you expect to be computed? Token-level accuracy as a % ?
Yes, what about token-level accuracy? It seems not easy to modify this file correctly. At least the output_type
and doc_to_xxx
should be set properly.
I am running llama2 model in wikitext dataset. I just want try some other metrics so I modify the default YAML file(
lm-evaluation-harness/lm_eval/tasks/wikitext/wikitext.yaml
) to the following, just deleting perplexity and adding acc.But when I run the evaluation again, I get an error like this:
Running script:
It seems the modification to wikitext YAML file does not work.