llm-as-judge Search Results

549 results
for llm-as-judge

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

tarasglek/chatcraft.org #610

Add cohere free model to free endpoint, set it to default

They offer free models for non-prod usage. this is a 104B, way better than other free models ```bash curl --request POST \ --url https://api.cohere.ai/v1/chat \ --header 'accept: application…

tarasglek updated 7 months ago
2
irthomasthomas/undecidability #812

Challenges in Evaluating Agent Performance: A Critical Analy…

- [ ] [Challenges in Evaluating Agent Performance: A Critical Analysis](https://arxiv.org/html/2404.11584v1) # Challenges in Evaluating Agent Performance: A Critical Analysis ## Snippet "6.2 Challen…

ShellLM updated 7 months ago
1
mlopscommunity/open-questions-ai-quality #20

Ensuring metrics reflect user needs

How do we know that the metrics we use for training are reflective of real-world user needs?

adamboazbecker updated 6 months ago
2
PlebLab/Top-Builder-Season-1 #37

[Top Builder 2024]: uncleJim21 (CASCDR_R2)

# 🏗️ Top Builder 2024 Application Form to track progress through Round 1 - 3 ~ Currently in Round 1 ## 📝 Instructions 1. Only complete this form if you have been chosen for Top Builder, by PlebLab…

uncleJim21 updated 9 months ago
6
jehna/humanify #14

Ollama Support

is it possible to use llama3 via ollama rather than huggingface one?

0xrsydn updated 1 month ago
13
langfuse/langfuse #3760

bug: asking for score before reasoning in LLM-as-a-judge eva…

### Describe the bug Not really a traditional bug, but moreso an issue with the way the response for evals is structured. By asking an LLM for score first and reason later, we are significantly hampe…

nikshepsvn updated 1 month ago
6
explodinggradients/ragas #1555

Add BERTScore as potential "non-LLM" metric / MetricWithEmbe…

**Describe the Feature** Add BERTScore as additional evaluation metric scorer for context-precision and context-recall. **Why is the feature important for you?** As a RAGAS user trying to eva…

ahgraber updated 2 weeks ago
4
PlebLab/Top-Builder-Season-1 #41

[Top Builder 2024]: NEO

# 🏗️ Top Builder 2024 Application Form to track progress through Round 1 - 3 ~ Currently in Round 1 ## 📝 Instructions 1. Only complete this form if you have been chosen for Top Builder, by PlebLab…

GrihmLord updated 10 months ago
1
wandb/weave #1983

Evaluate model sometimes leads to 'Runtime Error: bound to a…

I've been enjoying the Weave library quite a bit, but I have been running into an issue using the Evaluate method. The issue is that 20% of the time, when running my evaluation, I get the `Runtime Err…

EdIzaguirre updated 4 months ago
3
huggingface/lighteval #318

[FT] LLM-as-judge example that doesn't require OPENAI_KEY or…

## Issue encountered While setting up the framework to evaluate using LLM-as-judge, it would be helpful to test end-to-end without special permissions like setting up openai_key or HF pro subscriptio…

chuandudx updated 3 weeks ago
3

上一页 1...10 11 12 13 14 15 16...55 下一页

549 results for llm-as-judge

549 results
for llm-as-judge