llm-as-judge Search Results

549 results
for llm-as-judge

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

castorini/umbrela #1

Feature specs discussion board for umBRELA

I am starting this thread for feature spec discussion for umBRELA @lintool @ronakice. Suggestions from my side: - parameter for specifying the number of samples for inference and later performin…

UShivani3 updated 6 months ago
5
All-Hands-AI/OpenHands #1085

Create a competitive agent with open LLMs

**What problem or use case are you trying to solve?** Currently OpenDevin somewhat works with the strongest closed LLMs such as GPT-4 or Claude Opus, but we have not confirmed good results with ope…

neubig updated 1 week ago
14
hewei2001/ReachQA #1

Discrepancy in ChartQA numbers reported for InternVL2-8B

Firstly, I wanted to say the paper was a great read. Thank you for the excellent work! I noticed that Table 4 reports the ChartQA performance on InternVL2-8B as 73.80, whereas the [InternVL](https:…

varadgunjal updated 2 weeks ago
3
metauto-ai/agent-as-a-judge #20

How to use code to collect trajectory？ How to use LLM-as-a-j…

Hello, thank you for your great work. @mczhuge #First question: In your paper's fifth page, there is a sentence: "we record key decisions and actions made by the agentic systems through some custom …

zhang123434 updated 1 week ago
3
mlflow/mlflow #10809

[FR] Trace "hidden" prompts

### Willingness to contribute No. I cannot contribute this feature at this time. ### Proposal Summary When working with Mlflow Evaluation or AI agents, there are "hidden" system prompts that are no…

alena-m updated 2 months ago
4
fastenhealth/fasten-onprem #337

**ChatGPT-Like Offline Interface** for Querying Your Health …

> regarding the Chat GPT-like features, thats pretty far out on my road map currently, I want to add support for smart devices, wearables and home medical devices first. The other thing that makes t…

AnalogJ updated 10 months ago
3
h3nd3r/nutrition_pro #4

First pass at LLM as a judge

h3nd3r updated 1 month ago
1
irthomasthomas/undecidability #651

LLM API Host Leaderboard | Artificial Analysis

- [ ] [LLM API Host Leaderboard | Artificial Analysis](https://artificialanalysis.ai/leaderboards/hosts?parallel_queries=single&prompt_length=long) # LLM API Host Leaderboard | Artificial Analysis …

irthomasthomas updated 8 months ago
1
explodinggradients/ragas #1150

Getting error "Failed to parse output. Returning None" on fa…

I am getting error "Failed to parse output. Returning None" on faithfulness metric for some inputs. This is inconsistent behavior as it is haphazard and sometimes works, sometimes doesn't for the same…

ableiweiss updated 1 week ago
15
myendless1/llm-as-a-judge #1

boy，can you carry me to the CCF A?

Hellow, thankyou

jiejiedaren updated 5 months ago
2

上一页 1...7 8 9 10 11 12 13...55 下一页

549 results for llm-as-judge

549 results
for llm-as-judge