evaluate-llm Search Results

1000+ results
for evaluate-llm

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

ys-zong/VLGuard #7

[Expect complete evaluation code] Cannot reproduce the resul…

**My observation** - With https://github.com/ys-zong/VLGuard/blob/main/VLGuard_eval.py, I am able to reproduce results not too far from Table 2 for **VLGuard dataset**. - However, **I cannot reprodu…

oncleJules updated 2 weeks ago
1
explodinggradients/ragas #1421

evaluate function

I create a subclass of baseragassembeddings. because I already have all the embeddings for context, query, and question. I did this to not use the openai API key. because it is costly and also I want …

amin-kh96 updated 1 month ago
4
FlagOpen/FlagEmbedding #896

Evaluate an LLM reranker after finetuning

Hello, is there a way to evaluate an LLM reranker after I finetune it on my own training dataset? Also, how should the test be structured? Same as the training data (e,.g. toy_finetune_data.jsonl)? Th…

majdabd updated 5 months ago
1
explodinggradients/ragas #1449

Is there any example using local open source LLM model for e…

Using openAI api might not be feasible for some environment, could someone provide any reference or example link on running evaluation with local LLM model/?

RyanTree-HS updated 1 month ago
1
explodinggradients/ragas #1450

AttributeError('ContextRecallClassificationAnswers' object h…

[ ] I have checked the [documentation](https://docs.ragas.io/) and related resources and couldn't resolve my bug. **Describe the bug** I cannot evaluate open source gemma:2b model using ragas. I…

nprime496 updated 1 month ago
1
ScandEval/ScandEval #540

[BENCHMARK DATASET REQUEST] schibsted-text-tasks

### Dataset name Schibsted text tasks ### Dataset link https://huggingface.co/collections/Schibsted/schibsted-text-tasks-66655bce94d0f40432519347 ### Dataset languages - [ ] Danish - [X] Swedish …

simeneide updated 4 weeks ago
2
run-llama/llama_index #17012

[Bug]: guideline evaluation is throwing error

### Bug Description guideline evaluation is throwing error saying missing model ### Version latest ### Steps to Reproduce run the guideline evaluation example ### Relevant Logs/Tracbacks ```sh…

Rohith-Scalers updated 1 week ago
7
xp1632/Aalen_working_log #5

`Lowcoder: Visual Programming + Large Language Model`- AI fo…

### paper `AI for low-code for AI` **Research Contribution** - this paper argues that visual programming component compensates for the unambiguity caused by the natural language programming compo…

xp1632 updated 8 hours ago
1
open-compass/CriticEval #3

[Bug] 主观评测得分解析错误

例如以下例子中 \*\*Score: 9\*\* 被错误解析为 5.0 ```python { "question": "To cook perfectly golden pancakes,", "obj": { "generation_a": "Mix the ingredients together in a bowl and pour it onto…

WencWu updated 15 hours ago
1
IBM/data-prep-kit #597

[Feature] Explore agentic workflow capabilities for the crea…

### Search before asking - [X] I searched the [issues](https://github.com/IBM/data-prep-lab/issues) and found no similar issues. ### Component Other ### Feature LLM-based agentic workflows are e…

roytman updated 1 week ago
2

上一页 1...9 10 11 12 13 14 15...100 下一页

1000+ results for evaluate-llm

1000+ results
for evaluate-llm