-
The metrics-based approaches in the `QAAccuracy` eval algorithm seem to harshly penalize verbose models (like Claude) on datasets with concise reference answers (like SQuAD).
It'd be useful if this…
-
I'm teaching a group of students about AI and am planning to cover TextGrad. Is there a reason I would cover TinyTextGrad instead?
-
Hi
I have some issues about the tag . You mentioned in your prompt that tag will transfer to when the quotes don't pass verification through direct string matching.
Sometimes, LLMs (like GPT4) ma…
-
### System Info
torch==2.4.0
transformers==4.43.4
trl==0.9.6
tokenizers==0.19.1
accelerate==0.32.0
peft==0.12.0
datasets==2.20.0
deepspeed==0.15.0
bitsandbytes==0.43.3
sentencepiece==0.2.0
…
-
With the increasing capabilities of LLMs, it is only a matter of time before they become powerful/cheap enough to use them inside Dodona. A first step might be to generate draft answers for questions …
-
Hello! Here is the current pipeline on how to fetch related papers from the web. All feedback/suggestions are welcomed!
(1) Since there will be 600+ new papers listed each day and 2 millions in tot…
-
I'm new to this specific project, and I don't say any of the following with high confidence.
Things that I see as important for quantization:
*Inference speed*
- AWQ seems best on this front, t…
-
When attempting to train an LLM judge I get the following error.
```
---------------------------------------------------------------------------
RuntimeError Tracebac…
WJ44 updated
2 months ago
-
Hello,
There are issues with test cases for problems involving tree nodes. For example, problem 2096 has the below test case which is not correct and throws syntax error.
`assert solution.getDir…
-
Hi, can you please share the prompt you used to generate the answer? I hope to evaluate my model while keeping the prompts consistent, so that I can compare the results on the list.