llm-evaluation Search Results

1000+ results
for llm-evaluation

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

kubeedge/ianvs #123

Multimodal Large Model Joint Learning Algorithm: Reproductio…

**What would you like to be added/modified**: A benchmark suite for multimodal large language models deployed at the edge using KubeEdge-Ianvs: 1. Modify and adapt the existing edge-cloud data c…

CreativityH updated 1 week ago
51
hegelai/prompttools #69

Robustness evaluation

### 🚀 The feature Request from potential user: "There are two main aspects, 1) adjusting prompts that changing semantic words does not trigger hallucination, 2) the prompt itself is such that LLM doe…

steventkrawczyk updated 8 months ago
1
langchain-ai/langsmith-cookbook #253

The RAGAS example doesn't seem to work

Looking at the RAGAs sample notebook ... [https://github.com/langchain-ai/langsmith-cookbook/blob/main/testing-examples/ragas/ragas.ipynb](https://github.com/langchain-ai/langsmith-cookbook/blob/ma…

dividor updated 1 month ago
1
OpenBMB/MiniCPM-V #486

[BUG] Data fetch error - typo

### 是否已有关于该错误的issue或讨论？ | Is there an existing issue / discussion for this? - [X] 我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions ### 该问题是否在FAQ中有解答？ | Is there an existing ans…

anthisyme updated 2 days ago
5
microsoft/autogen #3247

An error occurred: Error code: 500 - {'error': {'message': '…

### Describe the issue Hello, I am trying to use Autogen for this multiagent healthcare system. The code looks like this: config_list = [ { "model": "gpt-3.5-turbo-16k", …

Adibahaq updated 2 weeks ago
3
aws/fmeval #163

[Feature] LLM-based (QA Accuracy) eval algorithm

The metrics-based approaches in the `QAAccuracy` eval algorithm seem to harshly penalize verbose models (like Claude) on datasets with concise reference answers (like SQuAD). It'd be useful if this…

athewsey updated 7 months ago
2
embeddings-benchmark/mteb #1149

[New dataset request] Please add MKQA

## Summary [MKQA: A Linguistically Diverse Benchmark for Multilingual Open Domain Question Answering](https://aclanthology.org/2021.tacl-1.82.pdf) > _"MKQA contains 10,000 queries sampled from t…

PrithivirajDamodaran updated 3 weeks ago
8
explodinggradients/ragas #1090

RAGAS with huggingface models

**Describe the bug** A clear and concise description of what the bug is. I tried using RAGAS with a model that is not OpenAI. In general whatever model I use I get this error back: ``` File …

SalvatoreRa updated 1 month ago
7
zou-group/textgrad #16

Asynchronous calls

Hi, Are you planning making textgrad llm calls asynchronous? I tried to start adding saynchronous methods to make at least evaluation calls and inference (everything that is forward) asynchrono…

ajms updated 2 months ago
3
finos/ai-readiness #30

Author the Threats and Controls

We need to start dividing up the work, authoring the various Threats / Controls. We'll use this issue to manage that work and their assignments. Each threat / control is 'ticked' when assigned to …

ColinEberhardt updated 2 days ago
12

上一页 1...16 17 18 19 20 21 22...100 下一页

1000+ results for llm-evaluation

1000+ results
for llm-evaluation