gpt-evaluation Search Results

1000+ results
for gpt-evaluation

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

Teddy-XiongGZ/MIRAGE #1

Understanding the data files

Hi, first of all great work. Really appreciate it. I am trying to understand the work and reproduce it + test it on some new models. I am having some trouble understanding the given files. 1. T…

ahmadmustafaanis updated 4 hours ago
1
LLMServe/DistServe #16

Decode Wrong Token

model: Llama-2-7b-hf step: 1、python3 converter.py --input "Llama-2-7b-hf/*.bin"--output /datasets/distserve/llama-7b --dtype float16 --model llama 2、python3 api_server/distserve_api_server.py --p…

sitabulaixizawaluduo updated 3 weeks ago
7
eric-mitchell/detect-gpt #12

Evaluation dataset for GPT-3 generations

Hi, I'm woundering if you could release your evaluation dataset for GPT-3 generations, including PubMedQA, XSum, and WritingP (each 150 samples). Since the randomness in OpenAI services, a shared eval…

baoguangsheng updated 10 months ago
1
EvolvingLMMs-Lab/lmms-eval #199

[Performance] Too slow MMBench evaluation

I’m running an evaluation on the MMBench-en dataset. The evaluation on the MME benchmark went smoothly, but when I switched to MMBench-en, the evaluation speed significantly slowed down. I’m using …

minjunsz updated 1 week ago
2
All-Hands-AI/OpenHands #2979

[Evaluation] Add SciCode Benchmark

**What problem or use case are you trying to solve?** Add SciCode Benchmark to OpenDevin's evaluation suite: https://x.com/MinyangTian1/status/1813182904593199553 cc @mtian8 (lead author of the…

xingyaoww updated 3 days ago
3
EleutherAI/lm-evaluation-harness #2287

Issue with openai completions API - related to logprobs

Hello, I believe there is a bug in your code on this line: https://github.com/EleutherAI/lm-evaluation-harness/blob/543617fef9ba885e87f8db8930fbbff1d4e2ca49/lm_eval/models/openai_completions.py#L7…

dmakhervaks updated 1 week ago
3
LLaVA-VL/LLaVA-NeXT #122

Evaluation of the video detailed description

Hi @ZhangYuanhan-AI Thanks for the wonderful job. Just a question about the evaluation of the detailed description. I found the result of the gpt eval score will be convert to int --- int(score), …

royzhang12 updated 1 month ago
1
xfactlab/orpo #34

Could you please tell me which OpenAI API you used during th…

Could you please tell me which OpenAI API you used during the MT-Bench evaluation? gpt-3.5 turbo or others?

hitszxs updated 2 months ago
2
juanniwu/t-SMILES #3

About Model Evaluation

Hi, regarding the evaluation process mentioned in the article, I have some doubts. Is it first to use the GPT generation model to generate t-SMILES sequences, and then to reconstruct molecules based o…

huyajing0 updated 2 months ago
1
swe-bench/experiments #47

Upload Reasoning Traces

With the release of the new SWE-bench evaluation harness last month, we have recently put forth a new set of submission guidelines requirements, detailed fully in the README and [here](https://www.swe…

john-b-yang updated 2 weeks ago
10

上一页 1...1 2 3 4 5 6 7...100 下一页

1000+ results for gpt-evaluation

1000+ results
for gpt-evaluation