gpt-evaluation Search Results

1000+ results
for gpt-evaluation

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

microsoft/semantic-kernel #7620

.Net: Bug: Using keyed services with `IServiceCollection.Add…

**Describe the bug** When using `IServiceCollection.AddKernel` and all connectors are key based (using ServiceId), is not possible to call `kernel.InvokePromptAsync` without specifying a serviceId in…

RogerBarreto updated 1 week ago
1
zuucan/NeedleInAHaystack-PLUS #2

Could you please opensource evaluation code

Greetings! Great work on using opensource language model agents to beat GPT-4 on long context QA. We have reproduced an agent framework based on the description given in your paper, but were unsure…

KevinCL16 updated 3 months ago
2
NVIDIA/RULER #12

gpt-4o results?

Would love to see results for gpt-4o. There was some claimed improvement in its abilities: http://nian.llmonpy.ai/

the21st updated 1 month ago
3
confident-ai/deepeval #1094

APIConnectionError: Connection error.

**❗BEFORE YOU BEGIN❗** Are you on discord? 🤗 We'd love to have you asking questions on discord instead: https://discord.com/invite/a3K9c8GRGt **Describe the bug** No matter what I try, I keep get…

paulacanva updated 1 week ago
3
irthomasthomas/undecidability #813

AlpacaEval: Revolutionizing Model Evaluation with LLM-Based …

- [ ] [AlpacaEval: Revolutionizing Model Evaluation with LLM-Based Automatic Tools](https://github.com/tatsu-lab/alpaca_eval?tab=readme-ov-file#making-a-new-evaluator) # AlpacaEval: Revolutionizing M…

ShellLM updated 6 months ago
1
alisawuffles/proxy-tuning #7

how can i reproduce the results on truthfulqa?

I notice that operating truthfulqa.sh requires "gpt_true_model_name" and "gpt_info_model_name". But it seems the original model is unavailable now.

SuperChanS updated 3 months ago
1
YJiangcm/FollowBench #6

some question

Hello, I have a question: After I executed model_inference.py and got the results, do I need to use my own model to infer all the questions before executing llm_eval.py? What will the result be after …

yuanzhiyong1999 updated 1 month ago
6
amazon-science/RAGChecker #11

Error Loading RagChecker

Setup Python version 3.11 Windows Machine pip install ragchecker python -m spacy download en_core_web_sm Its seems like there is trouble connecting with Azure OpenAI or utilising it. I used the…

ss8319 updated 1 month ago
9
ConiferLM/Conifer #7

Clarification on GPT-4 Versions for Evaluation

Hello Conifer Authors, Could you please clarify the specific versions of GPT-4 used during the evaluation process? Specifically, I’m looking for the exact versions used in: - FollowBench (e.g., …

X1AOX1A updated 1 week ago
5
lunary-ai/llm-benchmarks #4

All result details are missing...

![image](https://github.com/lunary-ai/llm-benchmarks/assets/8592144/edbdd956-e8ad-48bf-84dc-7e0192ad6c4d) Randomly check [GPT 4 03/14 (Legacy)](https://llm-benchmarks.vercel.app/gpt-4-0314) result bu…

zhimin-z updated 8 months ago
1

上一页 1...5 6 7 8 9 10 11...100 下一页

1000+ results for gpt-evaluation

1000+ results
for gpt-evaluation