llm-as-judge Search Results

548 results
for llm-as-judge

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

langchain-ai/langsmith-sdk #1073

Issue: dataset types lacking simple string option

### Issue you'd like to raise. I'm new to LangSmith and find the dataset structure more complicated (and confusing) than it needs to be. In some ways, the dataset is treated like a table, with Inp…

davidgilbertson updated 1 month ago
1
Chaos96/fourierft #12

Thank you for your code. Could you please provide the code f…

LiZhangMing updated 2 months ago
1
truera/trulens #1639

[BUG] Pace is not thread safe

**Bug Description** When I query the "anthropic.claude-3-5-sonnet-20240620-v1:0" model on Bedrock (it also happens with ("anthropic.claude-3-haiku-20240307-v1:0", but here it's a smaller issue), a `b…

drooms-sandrus updated 3 days ago
1
vllm-project/vllm #8237

[Bug]: requests with response_format cause vllm to hang with…

### Your current environment The output of `python collect_env.py` ```text Collecting environment information... PyTorch version: 2.4.0+cu121 Is debug build: False CUDA used to build PyTorch…

rymc updated 2 months ago
2
explodinggradients/ragas #859

[R-240] (docs): Document how to evaluating with a locally ho…

[X] I have checked the [documentation](https://docs.ragas.io/) and related resources and couldn't resolve my bug. **Describe the bug** I have a locally hosted LLM which I am intending to use as a jud…

Exploding-squid updated 3 months ago
9
finos/ai-readiness #30

Tracking issue for authoring of Threats and Controls

We need to start dividing up the work, authoring the various Threats / Controls. We'll use this issue to manage that work and their assignments. Each threat / control is 'ticked' when assigned to …

ColinEberhardt updated 1 week ago
23
OFA-Sys/AIR-Bench #3

Request for Complete Test Script for Qwen2-Audio on AIR Benc…

Hi, I'm currently trying to replicate the performance of Qwen2-Audio on the AIR Bench. However, I noticed that the repository at [AIR-Bench](https://github.com/OFA-Sys/AIR-Bench/blob/main/score_cha…

whwu95 updated 3 months ago
7
run-llama/llama_index #16111

[Question]: How to evaluate Agent?

### Question Validation - [X] I have searched both the documentation and discord for an answer. ### Question I designed a chatbot with an Agent to perform a series of actions. My agent works like…

NguyenDinhTiem updated 3 weeks ago
11
comet-ml/opik #411

[FR]: Task agnositic LLM as a judge metric like g-eval

### Willingness to contribute No. I can't contribute this feature at this time. ### Proposal summary There are currently LLM as a judge metrics like Hallucination detection and Moderation score. Ho…

jverre updated 2 weeks ago
1
huggingface/cookbook #216

Turkish translations

Opening this issue related to translation to Turkish where I did call for contributions [here](https://x.com/mervenoyann/status/1848267466314563825). 🇹🇷❣️ If you feel like getting started, you can …

merveenoyan updated 3 weeks ago
11

上一页 1...5 6 7 8 9 10 11...55 下一页

548 results for llm-as-judge

548 results
for llm-as-judge