gpt-evaluation Search Results

1000+ results
for gpt-evaluation

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

stanford-crfm/helm #2504

Problems and questions on the reproducibility of experiments

Hi, i'm trying to reproduce some of yours evaluation experiments. In particular i'm doing a **robustness evaluation on the task of Sentiment Analysis** on the *IMDB* dataset. As you said i'm using the…

CristianCosci updated 7 months ago
4
strickvl/mlops-dot-systems #13

posts/2024-07-01-full-finetuned-model-evaluation

# Alex Strick van Linschoten - My finetuned models beat OpenAI’s GPT-4 Finetunes of Mistral, Llama3 and Solar LLMs are more accurate for my test data than OpenAI’s models. [https://mlops.systems/pos…

utterances-bot updated 4 months ago
3
microsoft/RAG_Hack #108

Project: Quiz Maker - A GenAI tool for teachers to generate …

### Project Name Quiz Maker ### Description Quiz Maker is a GenAI tool that uses RAG to generate quiz on the fly based on content uploaded. It is an ASP.NET web application that utilises Sema…

shan-s updated 2 weeks ago
1
EvolvingLMMs-Lab/lmms-eval #199

[Performance] Too slow MMBench evaluation

I’m running an evaluation on the MMBench-en dataset. The evaluation on the MME benchmark went smoothly, but when I switched to MMBench-en, the evaluation speed significantly slowed down. I’m using …

minjunsz updated 2 months ago
2
MadcowD/ell #285

Evaluations in ell

This is a major feature release. Spec: https://github.com/MadcowD/ell/blob/cd64ab9bb0d3a09195fef7a32ef77ac5d7e6c912/docs/ramblings/evalspec.md Ramblings: https://github.com/MadcowD/ell/blob/cd64ab9…

MadcowD updated 1 week ago
10
All-Hands-AI/OpenHands #4355

[Bug]: Does the Browsing Agent need run_ipython action?

### Is there an existing issue for the same bug? - [X] I have checked the existing issues. ### Describe the bug and reproduction steps Running Browsing Agent with Deepseek, I got a syntax err…

enyst updated 4 weeks ago
3
cremebrule/digital-cousins #13

reproduce the released checkpoints

Could you please provide the process for reproducing the training of the 'cousin_ckpt.pth' and 'twin_ckpt.pth' files? Thank you.

andyaloha updated 4 days ago
5
Azure-Samples/azure-search-openai-demo #1989

o1-preview integration / testing

### This issue is for a: (mark with an `x`) ``` - [ ] bug report -> please search issues before submitting - [X] feature request - [ ] documentation issue or request - [ ] regression (a behavior …

ratkinsoncinz updated 1 month ago
3
WolframResearch/Chatbook #322

Chat GPT 4 unresponsive

Chat GPT-4 can't respond. It just sits and thinks. Chat GPT 3.5 has no problems. Debug Data | Property | Value | | --- | --- | | Name | ``"Wolfram/Chatbook"`` | | Version | ``"1.1.1"`` | …

OrionSmedley updated 2 months ago
2
jinlanfu/GPTScore #3

About the Evaluation of Dialogue Generation

GPTScore contains very elaborate experimental results for the generation-based evaluation method for lots of downstream NLG tasks, and thank you so much for your work. Recently, I also notice that …

gmftbyGMFTBY updated 3 months ago
4

上一页 1...4 5 6 7 8 9 10...100 下一页

1000+ results for gpt-evaluation

1000+ results
for gpt-evaluation