-
# Summary
Currently we have two "eval" scripts for measuring performance of LLMs post quantization: https://github.com/pytorch/ao/blob/main/torchao/_models/llama/eval.py,
https://github.com/pytorch/…
-
For example,
my high level thing is: what I did today
my low level representation:
```
list(
"morning"=c('breakfast', 'tv watching'),
"afternoon"=c('eating lunch')
)
```
-
Hi thanks for providing such wonderful evaluation toolkit.
I was wondering why evaluation on `mmlu_generative` returns 0 accuracy whenever what models I try (pythia, qwen).
I understand it as …
-
# Asking
- [x] Ask ChatGPT for help completing this homework
# Feeling, Writing, Thinking
- [x] Post an experience report (a paragraph or several) as a comment on this issue. This experience repo…
-
https://github.com/hendrycks/test/pull/13
https://github.com/EleutherAI/lm-evaluation-harness/pull/497
Want to add Falcon 40B here:
![image](https://github.com/h2oai/h2ogpt/assets/6147661/4142104…
-
how do we present the "monetary policy of the core devs" ? this is something that i'm putting in 'airquotes' for now, but i'm thinking it's a convo we'll neeed to have as this progresses
-
@llorracc has recommended looking into [bellman](https://bellman.dev/docs/latest/index.html), a toolkit for model-based reinforcement learning (MBRL), as inspiration for HARK.
What is `bellman`? It…
-
**I have added --tasks hendrycksTest* in my command, but gotten this error:**
Selected Tasks: ['hendrycksTest-college_medicine', 'hendrycksTest-high_school_macroeconomics', 'hendrycksTest-security_…
-
Hi,
I mentioned on the QE discourse that I was looking at porting some of the Python code to Julia and was told to open an issue.
Listed below are lectures that could potentially be ported ove…
-
First posted here: https://forum.obsidian.md/t/kindle-highlights-plugin-adding-tags-from-isbn-or-amazon-categories/41160
**Is your feature request related to a problem? Please describe.**
I'd like…