-
**Describe the bug**
When using `IServiceCollection.AddKernel` and all connectors are key based (using ServiceId), is not possible to call `kernel.InvokePromptAsync` without specifying a serviceId in…
-
Greetings! Great work on using opensource language model agents to beat GPT-4 on long context QA.
We have reproduced an agent framework based on the description given in your paper, but were unsure…
-
Would love to see results for gpt-4o. There was some claimed improvement in its abilities: http://nian.llmonpy.ai/
-
**❗BEFORE YOU BEGIN❗**
Are you on discord? 🤗 We'd love to have you asking questions on discord instead: https://discord.com/invite/a3K9c8GRGt
**Describe the bug**
No matter what I try, I keep get…
-
- [ ] [AlpacaEval: Revolutionizing Model Evaluation with LLM-Based Automatic Tools](https://github.com/tatsu-lab/alpaca_eval?tab=readme-ov-file#making-a-new-evaluator)
# AlpacaEval: Revolutionizing M…
-
I notice that operating truthfulqa.sh requires "gpt_true_model_name" and "gpt_info_model_name". But it seems the original model is unavailable now.
-
Hello, I have a question: After I executed model_inference.py and got the results, do I need to use my own model to infer all the questions before executing llm_eval.py? What will the result be after …
-
Setup
Python version 3.11
Windows Machine
pip install ragchecker
python -m spacy download en_core_web_sm
Its seems like there is trouble connecting with Azure OpenAI or utilising it. I used the…
-
Hello Conifer Authors,
Could you please clarify the specific versions of GPT-4 used during the evaluation process? Specifically, I’m looking for the exact versions used in:
- FollowBench (e.g., …
-
![image](https://github.com/lunary-ai/llm-benchmarks/assets/8592144/edbdd956-e8ad-48bf-84dc-7e0192ad6c4d)
Randomly check [GPT 4 03/14 (Legacy)](https://llm-benchmarks.vercel.app/gpt-4-0314) result bu…