evaluate-llm Search Results

All-Hands-AI/OpenHands #5240

FailOnRecoverableError mode

**What problem or use case are you trying to solve?** This is mostly for evaluation purpose. I am not sure what this mode should be properly named, so let me describe the scenario here: 1) Somet…

li-boxuan updated 17 hours ago

comet-ml/opik #632

[FR]: UI: See Traces and LLM Calls of Evaluations when using…

### Proposal summary See in the UI the traces that are produced when using LLM as Judges, both General and LLM spans. ### Motivation When using LLM as Judged metrics in your Evaluations, it is usef…

SrBliss updated 5 days ago

explodinggradients/ragas #1692

Exception raised in Job APIConnectionError(Connection error.…

Hello, I use local data, including reference and response data, use Azure, and then use ragas to obtain accuracy indicators, but errors are reported: ` Evaluating: 0%| | 1/301 [00:04

ldzh-97 updated 6 days ago

AkihikoWatanabe/paper_notes #1431

Evaluating the Effectiveness of LLM-Evaluators (aka LLM-as-J…

https://eugeneyan.com/writing/llm-evaluators/

AkihikoWatanabe updated 1 month ago

yamalight/litlytics #1

Evaluate using ONNX for local LLMs

Evaluate using https://github.com/xenova/transformers.js as local executor or fallback for when webgpu is not available

yamalight updated 1 month ago

elastic/kibana #200064

[Observability AI Assistant] [Root Cause Analysis] Create an…

As we begin to evaluate LLM assisted root cause analysis, we need a way to be able to evaluate the validity and usefulness of the results. Historically, our process for evaluating these results has …

dominiqueclarke updated 1 week ago

comet-ml/opik #567

[FR]: Enable Opik to display additional media formats, inclu…

### Proposal summary ## Feature Request Enable Opik to display additional media formats, including audio, PDF, and video. ## Background Opik currently supports only image display, which li…

pleomax0730 updated 2 weeks ago

confident-ai/deepeval #1139

Error while calculating Knowledge retention ; Evaluation LLM…

**Describe the bug** while running matrices **Knowledge retention**, getting error. I ensure that this is not all of the LLMTestcases. I am getting correct knowledge retention score for many inputs. …

jaysudhakaran updated 2 weeks ago

lilezek/llm-xpath #1

Has the project's performance been tested?

Hello, Thank you very much for open-sourcing this project! I am currently researching tools that utilize LLMs for XPath extraction and came across your project, llm-xpath. It looks very interesting…

YeSZ1520 updated 5 days ago

THUDM/LongWriter #35

how to use a local LLM to evaluate prediction quality? For e…

### Feature request / 功能建议 how to use a local LLM to evaluate prediction quality? For example, Llama-3-70B-Instruct? ### Motivation / 动机 how to use a local LLM to evaluate prediction quality? For …

txchen-USTC updated 4 weeks ago

1000+ results for evaluate-llm

1000+ results
for evaluate-llm