mmlu Search Results - Githubissues

1000+ results
for mmlu

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

redhat-et/datascience-wg #7

Comparison of different LLM fine tuning methods for Granite …

We would like to evaluate the model performance for various LLM fine tuning approaches and compare them with the standard benchmarks. An experiment we would like to try is: - **Compare the full car…

hemajv updated 3 months ago
2
EleutherAI/lm-evaluation-harness #1958

Wandb logger can't handle groups with heterogenous metrics

Hi, The wandb logger chokes if a group contains some tasks that output numbers and some that output strings. This is either a bug in `WandbLogger.log_eval_samples` or in the `openllm` group (maybe …

dmitrii-palisaderesearch updated 4 months ago
11
laiviet/lm-evaluation-harness #3

NUM_FEW_SHOT does not correspond with the leaderboard

There seems to be a discrepancy between the [leaderboard](https://huggingface.co/spaces/uonlp/open_multilingual_llm_leaderboard) and this repository, which may end up meaning that models were benchmar…

BramVanroy updated 9 months ago
1
EleutherAI/lm-evaluation-harness #1824

Avoid slow testing due to network issues.

This is just a reminder that in some cases where the network is restricted, **please remember to set the environment variable HF_DATASETS_OFFLINE to 1 to enable full offline mode**. This will prevent …

pixeli99 updated 5 months ago
2
EleutherAI/lm-evaluation-harness #1719

When using Accelerate for data parallel inference, using dif…

Hi, @haileyschoelkopf Thank you for your awsome open-source work. We have been evaluating using `lm-eval` and noticed that when using `accelerate` for data parallel inference, the number of GPUs utili…

s1ghhh updated 5 months ago
4
open-compass/opencompass #500

[Feature] I try to use instruction tuning to fine-tune llama…

### Describe the feature How to use compass to evaluate the local alpaca model on MMLU and other datasets ### Will you implement it? - [ ] I would like to implement this feature and create a PR!

sglucas updated 1 year ago
2
EleutherAI/lm-evaluation-harness #2346

Unexpected space character

Hi, While running `leaderboard_mmlu_pro` evals I've noticed an unexpected space character. Here is an example request: ``` 2024-09-25:06:46:53,199 INFO [evaluator_utils.py:200] Request: Insta…

eldarkurtic updated 1 month ago
3
stanford-crfm/helm #2257

HuggingFace warning: Setting `pad_token_id` to `eos_token_id…

When running with the example run_spec: `mmlu:subject=anatomy,model=openai/gpt2` and no caching, the HuggingFace client outputs the following warning on every call: ``` Setting `pad_token_id` to `…

brianwgoldman updated 3 months ago
1
EleutherAI/lm-evaluation-harness #2447

Few Shot configurations for multilingual tasks

For multilingual arc, the [original implementation](https://github.com/nlp-uoregon/mlmm-evaluation/blob/main/lm_eval/tasks/multilingual_arc.py) has 25 shots but in lm_eval, it doesn't https://github.…

Mugariya updated 2 days ago
1
EleutherAI/lm-evaluation-harness #662

Central repository for results from running the evaluations

# Motivation I want to use MMLU results by task to better understand the characteristics of LLMs. I am curious to see the differences between architectures and how performance in the tasks change as…

c1505 updated 1 year ago
7

上一页 1...10 11 12 13 14 15 16...100 下一页

1000+ results for mmlu

1000+ results
for mmlu