-
Hi, first of all great work. Really appreciate it.
I am trying to understand the work and reproduce it + test it on some new models. I am having some trouble understanding the given files.
1. T…
-
model: Llama-2-7b-hf
step:
1、python3 converter.py --input "Llama-2-7b-hf/*.bin"--output /datasets/distserve/llama-7b --dtype float16 --model llama
2、python3 api_server/distserve_api_server.py --p…
-
Hi, I'm woundering if you could release your evaluation dataset for GPT-3 generations, including PubMedQA, XSum, and WritingP (each 150 samples). Since the randomness in OpenAI services, a shared eval…
-
I’m running an evaluation on the MMBench-en dataset. The evaluation on the MME benchmark went smoothly, but when I switched to MMBench-en, the evaluation speed significantly slowed down.
I’m using …
-
**What problem or use case are you trying to solve?**
Add SciCode Benchmark to OpenDevin's evaluation suite:
https://x.com/MinyangTian1/status/1813182904593199553
cc @mtian8 (lead author of the…
-
Hello, I believe there is a bug in your code on this line:
https://github.com/EleutherAI/lm-evaluation-harness/blob/543617fef9ba885e87f8db8930fbbff1d4e2ca49/lm_eval/models/openai_completions.py#L7…
-
Hi @ZhangYuanhan-AI
Thanks for the wonderful job. Just a question about the evaluation of the detailed description. I found the result of the gpt eval score will be convert to int --- int(score), …
-
Could you please tell me which OpenAI API you used during the MT-Bench evaluation?
gpt-3.5 turbo or others?
-
Hi, regarding the evaluation process mentioned in the article, I have some doubts. Is it first to use the GPT generation model to generate t-SMILES sequences, and then to reconstruct molecules based o…
-
With the release of the new SWE-bench evaluation harness last month, we have recently put forth a new set of submission guidelines requirements, detailed fully in the README and [here](https://www.swe…