evaluate-llm Search Results

1000+ results
for evaluate-llm

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

g-simmons/persona-research-internship #157

Ques: Are ontology scores correlated with model performance …

## Background We are curious to know whether ontology score correlates to performance on downstream tasks. We could evaluate performance on downstream tasks ourselves, but as a first approximation,…

g-simmons updated 3 weeks ago
1
PacktPublishing/LLM-Engineers-Handbook #14

ISSUE: Import error during setup

In attempting to follow the setup in the README, am able to successfully call: ``` poetry poe local-infrastructure-up ``` Can then access the ZenML dashboard. However, none of the pipelines s…

bgereke updated 1 week ago
1
aiplanethub/beyondllm #32

[Feat_Add] Addition of new LLM evals metric

Beyond LLM supports, 4 evaluation metrics: Context relevancy, Answer relevancy, Groundedness, and Ground truth. We would be looking forward to add new evaluation metric support to evaluate LLM/RAG…

tarun-aiplanet updated 4 weeks ago
3
explodinggradients/ragas #1444

Message 'o statements were generated from the answer' was se…

[ ] I have checked the [documentation](https://docs.ragas.io/) and related resources and couldn't resolve my bug. **Describe the bug** Message 'o statements were generated from the answer' was sen…

francescofan updated 1 month ago
3
Kacper-W-Kozdon/promptflow_unify_integration #4

Improve benchmark_models in evaluate_llm_tool.py

Moving on to the **benchmark_models** tool in `evaluate_llm_tool.py,` and I had a few suggestions that might help improve it: - Docstring Accuracy: The docstring currently mentions a prompt_set…

KatoStevenMubiru updated 3 months ago
7
microsoft/rag-experiment-accelerator #707

Add promptflow-evals quality metrics as an alternative to ra…

We currently leverage some llm based evaluation metrics from ragas: https://github.com/explodinggradients/ragas namely, `llm_context_precision`, `llm_context_recall` and `llm_answer_relevance` in thi…

prvenk updated 2 months ago
1
run-llama/llama_index #16898

[Bug]: Eval Dataset passed as None when trying to Finetune E…

### Bug Description On llama-index 0.11.22 and llama-index-finetuning 0.2.1. I was attempting to follow the documentation to finetune the BAAI/bge-small-en-v1.5 model on my own dataset. I attempted…

mjohal3 updated 2 weeks ago
1
umcchavali/t-c #16

Document Systematic Testing

1. Performance metrics 2. Reliability measures 3. Areas for improvement 4. Test suite setup - Updated the queriesandresponses.json file in the repo.

lasyaEd updated 3 days ago
7
huggingface/lighteval #379

[FT] Evaluation using a multi-document RAG based on statisti…

## Issue encountered It would be good to have a system for evaluating both the relevance of the RAG and its use by the LLM in producing the response. My first intuition would be a multi-stage system …

louisbrulenaudet updated 3 weeks ago
2
mlflow/mlflow-website #77

Evaluate LLMs with custom metrics with LLM as a judge

## Summary This template is intended to capture a few base requirements that are needed to be met prior to filing a PR that contains a new blog post submission. Please fill out this form in its…

iRahulPandey updated 5 months ago
2

上一页 1...6 7 8 9 10 11 12...100 下一页

1000+ results for evaluate-llm

1000+ results
for evaluate-llm