-
They offer free models for non-prod usage. this is a 104B, way better than other free models
```bash
curl --request POST \
--url https://api.cohere.ai/v1/chat \
--header 'accept: application…
-
- [ ] [Challenges in Evaluating Agent Performance: A Critical Analysis](https://arxiv.org/html/2404.11584v1)
# Challenges in Evaluating Agent Performance: A Critical Analysis
## Snippet
"6.2 Challen…
-
How do we know that the metrics we use for training are reflective of real-world user needs?
-
# 🏗️ Top Builder 2024 Application Form to track progress through Round 1 - 3 ~ Currently in Round 1
## 📝 Instructions
1. Only complete this form if you have been chosen for Top Builder, by PlebLab…
-
is it possible to use llama3 via ollama rather than huggingface one?
-
### Describe the bug
Not really a traditional bug, but moreso an issue with the way the response for evals is structured. By asking an LLM for score first and reason later, we are significantly hampe…
-
**Describe the Feature**
Add BERTScore as additional evaluation metric scorer for context-precision and context-recall.
**Why is the feature important for you?**
As a RAGAS user trying to eva…
-
# 🏗️ Top Builder 2024 Application Form to track progress through Round 1 - 3 ~ Currently in Round 1
## 📝 Instructions
1. Only complete this form if you have been chosen for Top Builder, by PlebLab…
-
I've been enjoying the Weave library quite a bit, but I have been running into an issue using the Evaluate method. The issue is that 20% of the time, when running my evaluation, I get the `Runtime Err…
-
## Issue encountered
While setting up the framework to evaluate using LLM-as-judge, it would be helpful to test end-to-end without special permissions like setting up openai_key or HF pro subscriptio…