llm-as-judge Search Results

548 results
for llm-as-judge

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

aws/fmeval #163

[Feature] LLM-based (QA Accuracy) eval algorithm

The metrics-based approaches in the `QAAccuracy` eval algorithm seem to harshly penalize verbose models (like Claude) on datasets with concise reference answers (like SQuAD). It'd be useful if this…

athewsey updated 9 months ago
2
witt3rd/tinytextgrad #2

When would I use TinyTextGrad instead of TextGrad?

I'm teaching a group of students about AI and am planning to cover TextGrad. Is there a reason I would cover TinyTextGrad instead?

DallanQ updated 3 months ago
2
ucl-dark/llm_debate #2

about <u_quote></u_quote>

Hi I have some issues about the tag . You mentioned in your prompt that tag will transfer to when the quotes don't pass verification through direct string matching. Sometimes, LLMs (like GPT4) ma…

tqzhong updated 3 months ago
6
huggingface/trl #2250

OOM when unwrap_model_for_generation

### System Info torch==2.4.0 transformers==4.43.4 trl==0.9.6 tokenizers==0.19.1 accelerate==0.32.0 peft==0.12.0 datasets==2.20.0 deepspeed==0.15.0 bitsandbytes==0.43.3 sentencepiece==0.2.0 …

hlnchen updated 23 hours ago
4
dodona-edu/dodona #5331

Automatically generate draft answers for student questions

With the increasing capabilities of LLMs, it is only a matter of time before they become powerful/cheap enough to use them inside Dodona. A first step might be to generate draft answers for questions …

bmesuere updated 3 weeks ago
3
Rollingpig/PaperWard #2

[Pipeline Discussion] How to fetch related papers?

Hello! Here is the current pipeline on how to fetch related papers from the web. All feedback/suggestions are welcomed! (1) Since there will be 600+ new papers listed each day and 2 millions in tot…

Rollingpig updated 1 week ago
11
ml-explore/mlx #135

Thoughts on Quantization Roadmap

I'm new to this specific project, and I don't say any of the following with high confidence. Things that I see as important for quantization: *Inference speed* - AWQ seems best on this front, t…

RonanKMcGovern updated 2 weeks ago
4
stanford-futuredata/ARES #66

Error while training LLM judge

When attempting to train an LLM judge I get the following error. ``` --------------------------------------------------------------------------- RuntimeError Tracebac…

WJ44 updated 2 months ago
4
huangd1999/EffiBench #2

Incorrect Testcases

Hello, There are issues with test cases for problems involving tree nodes. For example, problem 2096 has the below test case which is not correct and throws syntax error. `assert solution.getDir…

JDVPREDDY updated 1 week ago
2
KbsdJames/Omni-MATH #2

Generation Prompt Reference

Hi, can you please share the prompt you used to generate the answer? I hope to evaluate my model while keeping the prompts consistent, so that I can compare the results on the list.

BrenchCC updated 1 month ago
3

上一页 1...3 4 5 6 7 8 9...55 下一页

548 results for llm-as-judge

548 results
for llm-as-judge