-
# Summary
Currently we have two "eval" scripts for measuring performance of LLMs post quantization: https://github.com/pytorch/ao/blob/main/torchao/_models/llama/eval.py,
https://github.com/pytorch/…
-
Hi thanks for providing such wonderful evaluation toolkit.
I was wondering why evaluation on `mmlu_generative` returns 0 accuracy whenever what models I try (pythia, qwen).
I understand it as …
-
node and docker
http://jdlm.info/articles/2016/03/06/lessons-building-node-app-docker.html?r=0
valve varoufakis
http://blogs.valvesoftware.com/economics/why-valve-or-what-do-we-need-corporations-for-a…
-
**I have added --tasks hendrycksTest* in my command, but gotten this error:**
Selected Tasks: ['hendrycksTest-college_medicine', 'hendrycksTest-high_school_macroeconomics', 'hendrycksTest-security_…
-
Re: http://www.facebook.com/pauliewaulie/posts/189564251172698
Here's the initial plan:
1. Roll into BannedList plugin first _(can refactor later, though there's a lot of common ground, and I think w…
-
As I mentioned in #3, I think there are two major causes of confusion when it comes to "beating CAP." Number two is the difference between CAP "consistency" and application-level "consistency."
While…
-
https://github.com/hendrycks/test/pull/13
https://github.com/EleutherAI/lm-evaluation-harness/pull/497
Want to add Falcon 40B here:
![image](https://github.com/h2oai/h2ogpt/assets/6147661/4142104…
-
Hi @alexgreen,
We've got a bunch of priorities to start in on, so don't jump on this just yet -- I'm just adding it here to capture the need.
The "commenting guidelines" need to be re-implemented in…
-
Hi, I tried finetuning falcon-40b with Qlora & compare its performance with llama-65b which is also finetuned with Qlora & both finetuned on same dataset oassist.
And I tried to compare its MMLU ev…
-
I'm curious about what this achieves feature-wise versus ungoogled-chromium. If it achieves more, then I wonder if the ungoogled-chromium projects would be interested in your patches. Here's a brainst…