-
dont put them in public repos
-
Thank you for the great work! I've noticed that most of the existing benchmarks are somewhat outdated. Is there any possibility of releasing these latest evaluations, for models like GPT-4o, LLaMA 3.…
rgtjf updated
2 weeks ago
-
Hi,
I'm currently trying to replicate the performance of Qwen2-Audio on the AIR Bench. However, I noticed that the repository at [AIR-Bench](https://github.com/OFA-Sys/AIR-Bench/blob/main/score_cha…
-
A related issue posted in https://github.com/bytedance/Flash-VStream/issues/2.
After **training the model by myself** following scripts in this official repo, the evaluation results on MSVD and M…
-
### Title
Fashion Product Retrieval Using Semantic Search and Natural Language Generation
### Team Name
InfoSphere
### Email
202318007@daiict.ac.in
### Team Member 1 Name
Kavisha …
-
### Describe the bug
when i had to reproduce the logs as mentioned in the [Benchmarking](https://princeton-nlp.github.io/SWE-agent/usage/benchmarking/) , the swe-agent created a patch but when eva…
Hk669 updated
3 months ago
-
I notice that operating truthfulqa.sh requires "gpt_true_model_name" and "gpt_info_model_name". But it seems the original model is unavailable now.
-
Would love to see results for gpt-4o. There was some claimed improvement in its abilities: http://nian.llmonpy.ai/
-
Hi,
I am trying to run some LLMs (currently trying openai models) on MMLU. My first question is which configuration is the standard setup (5 shot without CoT)? What does flan mean in some of the c…
-
Hi, I'm running FID evaluation code by following command
```bash
bash scripts/autoregressive/sample_c2i.sh --vq-ckpt ./pretrained_models/vq_ds16_c2i.pt --gpt-ckpt ./pretrained_models/c2i_B.pt --gpt-…