llm-evaluation-framework Search Results

484 results
for llm-evaluation-framework

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

open-compass/opencompass #1427

[Feature] Unsupported Model Type in vLLM/LMDeploy accelerat…

### Describe the feature I tried the vLLM and LMDeploy using the following command: ``` python run.py \ --datasets humaneval_gen \ --hf-type chat \ --hf-path meta-llama/Meta-Llama-3-…

Rcrossmeister updated 1 month ago
4
haotian-liu/LLaVA #768

[Question] Regarding Captioning Evaluation on Flickr30k

### Question Hi, thanks for the great work! I have been trying to evaluate llava image captioning on Flickr30k, but I am not able to reproduce the results. While the original llava paper does not hav…

devaansh100 updated 1 week ago
8
EleutherAI/lm-evaluation-harness #1231

CoQA's implementation only predicts the last answer of each …

For CoQA, in coqa/utils.py, only the last answer of each text (i.e. the answer for the last turn_id, with all the previous questions and answers in the context window) is predicted. On the website of …

glerzing updated 10 months ago
1
explodinggradients/ragas #1205

Can Custom Metrics be added?

[x] I checked the [documentation](https://docs.ragas.io/) and related resources and couldn't find an answer to my question. **Your Question** what is unclear to you? What would you like to know? …

LDelPinoNT updated 2 months ago
6
run-llama/llama_index #16111

[Question]: How to evaluate Agent?

### Question Validation - [X] I have searched both the documentation and discord for an answer. ### Question I designed a chatbot with an Agent to perform a series of actions. My agent works like…

NguyenDinhTiem updated 1 week ago
11
explodinggradients/ragas #1387

Japanese Specification for Answer Relevance

- [x] I have checked the [documentation](https://docs.ragas.io/) and related resources and couldn't resolve my bug. **Your Question** I would like to use Answer Relevance for RAG evaluation in Jap…

TakutoIyanagi-littletree updated 1 month ago
8
AkihikoWatanabe/paper_notes #1214

Leveraging Large Language Models for NLG Evaluation: A Surve…

# URL - https://arxiv.org/abs/2401.07103 # Affiliations - Zhen Li, N/A - Xiaohan Xu, N/A - Tao Shen, N/A - Can Xu, N/A - Jia-Chen Gu, N/A - Chongyang Tao, N/A # Abstract - In the rapidly…

AkihikoWatanabe updated 6 months ago
1
KTH/devops-course #1016

AIOps / MLOps / Infrastructure and software engineering for …

* https://en.wikipedia.org/wiki/MLOps

monperrus updated 2 weeks ago
35
explodinggradients/ragas #1100

Local LLM with Ragas evaluation issue

[ ] I have checked the [documentation](https://docs.ragas.io/) and related resources and couldn't resolve my bug. **Describe the bug** I am trying to use a local LLM in the evaluate function., whe…

SalwaMostafa updated 2 months ago
6
tjunlp-lab/Awesome-LLMs-Evaluation-Papers #7

SeaEval: Multilingual LLM Evaluation

Please note our paper on evaluation, which could be an important building block for multilingual evaluation and cultural understanding. [SeaEval for Multilingual Foundation Models: From Cross-Lingu…

BinWang28 updated 11 months ago
7

上一页 1...4 5 6 7 8 9 10...49 下一页

484 results for llm-evaluation-framework

484 results
for llm-evaluation-framework