-
The `helm|piqa` task listed in `tasks_table.jsonl` here: https://github.com/huggingface/lighteval/blob/a98210fd3a2d1e8bface1c32b72ebd5017173a4c/src/lighteval/tasks/tasks_table.jsonl#L797C1-L797C472.
…
-
If we are running classification tasks with LLM, how can we calculate overall precision, recall and F1 score from the evals?
It is not clear if derived metrics allow us to do that. Any suggestions?…
-
Hey everyone, this is an awesome project! However in using this we found a small issue with the npmad apm plugin
Nomad now exposes the below metrics
nomad.nomad.blocked_evals.cpu
nomad.nomad.bl…
-
Hi, I met a bug to access the embeddings from hyenaDNA, especially for the code:
/evals/hg38_inference.py
Traceback (most recent call last):
File "/gpfs/gibbs/pi/zhao/tl688/hyena-dna/evals/hg…
-
Thanks for your brilliant work! Having downloaded K400 pretrained checkpoint file(k400-probe.pth.tar) and modified the config yaml file for the corresponding dataset(specifying datapath), I ran evals.…
-
DSPy provides its own set of evaluation methods for evaluating compiled DSPy modules on dev sets, e.g., exact answer match and relevance. We can add these evaluations as annotations via `log_evaluatio…
-
Hi, cool project :)
I took a look at the evals and noticed that there's only 127 eval files. Further, only 107 of them seem to pass the tests.
Would it be possible for you to post the rest of th…
-
- Guide for assessing Food stamp app forms 2003
http://www.fns.usda.gov/sites/default/files/assessment-guide.pdf
- guide for assessing online apps http://www.fns.usda.gov/sites/default/files/snap/Be…
-
-
**Describe the bug**
I followed the example in MSDocs [Evaluate on test dataset using `evaluate()`](https://learn.microsoft.com/en-us/azure/ai-studio/how-to/develop/flow-evaluate-sdk#evaluate-on-test…