gpt-evaluation Search Results

1000+ results
for gpt-evaluation

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

jameszhou-gl/gpt-4v-distribution-shift #9

hi your openai api key is public and searchable

dont put them in public repos

skermiebroTech updated 1 month ago
1
infinigence/LVEval #3

Updated Benchmark Results, like GPT-4o, LLaMA 3.1, and Qwen …

Thank you for the great work! I've noticed that most of the existing benchmarks are somewhat outdated. Is there any possibility of releasing these latest evaluations, for models like GPT-4o, LLaMA 3.…

rgtjf updated 2 weeks ago
1
OFA-Sys/AIR-Bench #3

Request for Complete Test Script for Qwen2-Audio on AIR Benc…

Hi, I'm currently trying to replicate the performance of Qwen2-Audio on the AIR Bench. However, I noticed that the repository at [AIR-Bench](https://github.com/OFA-Sys/AIR-Bench/blob/main/score_cha…

whwu95 updated 1 month ago
7
IVGSZ/Flash-VStream #3

Reproduction of results on MSVD and MSRVTT

A related issue posted in https://github.com/bytedance/Flash-VStream/issues/2. After **training the model by myself** following scripts in this official repo, the evaluation results on MSVD and M…

ShaneeyS updated 1 month ago
4
parth126/IT550 #19

Fashion Product Retrieval Using Semantic Search and Natural …

### Title Fashion Product Retrieval Using Semantic Search and Natural Language Generation ### Team Name InfoSphere ### Email 202318007@daiict.ac.in ### Team Member 1 Name Kavisha …

KavishaMadani updated 3 days ago
1
princeton-nlp/SWE-agent #471

Predictions for the following instance_ids were not found in…

### Describe the bug when i had to reproduce the logs as mentioned in the [Benchmarking](https://princeton-nlp.github.io/SWE-agent/usage/benchmarking/) , the swe-agent created a patch but when eva…

Hk669 updated 3 months ago
8
alisawuffles/proxy-tuning #7

how can i reproduce the results on truthfulqa?

I notice that operating truthfulqa.sh requires "gpt_true_model_name" and "gpt_info_model_name". But it seems the original model is unavailable now.

SuperChanS updated 1 month ago
1
hsiehjackson/RULER #12

gpt-4o results?

Would love to see results for gpt-4o. There was some claimed improvement in its abilities: http://nian.llmonpy.ai/

the21st updated 5 days ago
3
EleutherAI/lm-evaluation-harness #2094

Having issues with MMLU benchmark

Hi, I am trying to run some LLMs (currently trying openai models) on MMLU. My first question is which configuration is the standard setup (5 shot without CoT)? What does flan mean in some of the c…

berkatil updated 1 month ago
9
FoundationVision/LlamaGen #20

Error in FID evaluation

Hi, I'm running FID evaluation code by following command ```bash bash scripts/autoregressive/sample_c2i.sh --vq-ckpt ./pretrained_models/vq_ds16_c2i.pt --gpt-ckpt ./pretrained_models/c2i_B.pt --gpt-…

Artanic30 updated 3 months ago
2

上一页 1...1 2 3 4 5 6 7...100 下一页

1000+ results for gpt-evaluation

1000+ results
for gpt-evaluation