mmlu Search Results - Githubissues

1000+ results
for mmlu

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

instructlab/instructlab #1871

[Epic] `--model-path`, `--model`, and all other model relate…

the clickext class should be extended to figure out if a flag is either an HF path OR a local path. Based off of which type of argument is passed, there should be a log and specific behavior for the d…

cdoern updated 2 days ago
3
EleutherAI/lm-evaluation-harness #2296

Low GPU Utilization During Multi-GPU evaluation - Efficiency…

Hello, I want to express my gratitude for your outstanding work. The powerful lm-evaluation-harness and your continuous maintenance have made LLM-evaluation much more convenient. However, I hav…

yang3121099 updated 1 month ago
1
confident-ai/deepeval #993

deepeval AttributeError: 'str' object has no attribute 'answ…

**❗BEFORE YOU BEGIN❗** Are you on discord? 🤗 We'd love to have you asking questions on discord instead: https://discord.com/invite/a3K9c8GRGt **Describe the bug** A clear and concise description …

habout632 updated 1 month ago
1
stanford-crfm/helm #1918

helm-run saves output to "benchmark_output" regardless of th…

Even when specifying a different `--output-path` via the CLI flag, the benchmark runs still produce some data under the folder `benchmark_output`. The folders produced are `benchmark_output/dialect` a…

sfriedowitz updated 1 month ago
1
FranxYao/chain-of-thought-hub #38

Include per-task performances?

Are there any plans to release detailed performance metrics for individual tasks from BBH and MMLU? I think it could be very valuable for research to be able to look at those individual task performan…

shatu updated 1 year ago
1
vllm-project/vllm #5907

[Bug]: TRACKING ISSUE: CUDA OOM with Logprobs

### Your current environment ```text The output of `python collect_env.py` ``` ### 🐛 Describe the bug vLLM has an issue where we can go OOM if too many `logprobs` are requested. The reason t…

robertgshaw2-neuralmagic updated 3 weeks ago
8
NLP-Core-Team/mmlu_ru #1

Incorrect behaviour for too long samples

Hi! There is a bug in the if-else statement that causes it to fail if the example is too long. `current_k_shot` may become `-1` in the `if` branch and on the next iteration `get_prompt_from_dataframe…

SpirinEgor updated 1 year ago
1
confident-ai/deepeval #508

Support popular benchmarks

Currently, users of `deepeval` can only create their own evaluation dataset/test cases. To support more users fine-tuning their model, `deepeval` should be able to import standard benchmarks such as M…

penguine-ip updated 8 months ago
3
TIGER-AI-Lab/MAmmoTH2 #2

Issue about reproducing results in some datasets

Thanks for your great work! I clone the `math_eval` directory and run `run_7B_plus.sh` directly, and find some performance gaps in some datasets. | Model | TheoremQA | GPQA |…

ToheartZhang updated 5 months ago
4
OpenBioLink/ThoughtSource #132

Source mode not working for new datasets

Loading datasets med_qa_open and the MMLU datasets does not work in source view. (It is working in thoughtsource view, so generating CoTs etc is working.)

KonstantinHebenstreit updated 1 year ago
1

上一页 1...9 10 11 12 13 14 15...100 下一页

1000+ results for mmlu

1000+ results
for mmlu