llm-evaluation Search Results

1000+ results
for llm-evaluation

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

kubeedge/ianvs #94

Large Language Model Edge Benchmark Suite: Implementation on…

**What would you like to be added/modified**: A benchmark suite for large language models deployed at the edge using KubeEdge-Ianvs: 1. Interface Design and Usage Guidelines Document; 2. Implem…

nailtu30 updated 1 month ago
3
rese1f/MovieChat #57

Question about results on Egoschema

Hi, thanks for your great work! I have read your MovieChat+ paper and noticed that the Zero-shot QA Evaluation result of MovieChat on EgoSchema is 53.5, while the evaluation result in this CVPR paper(…

pPetrichor updated 1 month ago
3
intel-analytics/ipex-llm #11472

Google gemma models has much lower accuracy than their accur…

Hi, We have run three Google gemma models with Winogrande on MTL or LNL, and we got much lower accuracy than Open LLM leaderboard. The detailed data as below: Model | Precision | Device | Trans…

yzha107 updated 2 days ago
4
Auto-Playground/ragrank #46

LLM Wrapper Error: ValueError: OPENAI_API_KEY not found in t…

**Code** I tried to use `evaluate` with a `LangchainLLMWrapper`, however for some it still requires an OpenAI key, here is the code: ``` from ragrank import evaluate from ragrank.evaluation import…

antoninoLorenzo updated 1 week ago
2
AILab-CVC/SEED-Bench #12

VLMs vs LLMs evaluation

Hello 👋 First of all thank you for the great work and evaluation results! I have understood that in many cases you predicted outputs for each question based on the choice that minimizes the loss…

idan-tankel updated 6 months ago
1
explodinggradients/ragas #1066

sentence_segmenter in metric should be adapt to language in …

[ ] I have checked the [documentation](https://docs.ragas.io/) and related resources and couldn't resolve my bug. **Describe the bug** It's good that almost all metric in ragas can be adapt to oth…

jmgu0212 updated 5 hours ago
1
Agenta-AI/agenta #1300

[AGE-272] [Bug] Improve errors in evaluation view

When errors happen in the evaluation view, there are a couple of problems: * The error in the llm app are shown in black and not in red as expected * The issue is in the LLM app response and not in…

mmabrouk updated 14 hours ago
1
intel-analytics/ipex-llm #11163

Evaluation on if MiniCPM-2B-sft-bf16 need model based optimi…

Below are the benchmark results on both THUDM/chatglm3-6b and openbmb/MiniCPM-2B-sft-bf16, from which we can see that chatglm3-6b has better throughput than miniCPM-2b. Considering MiniCPM-2b is a 2…

wluo1007 updated 2 weeks ago
2
irthomasthomas/undecidability #813

AlpacaEval: Revolutionizing Model Evaluation with LLM-Based …

- [ ] [AlpacaEval: Revolutionizing Model Evaluation with LLM-Based Automatic Tools](https://github.com/tatsu-lab/alpaca_eval?tab=readme-ov-file#making-a-new-evaluator) # AlpacaEval: Revolutionizing M…

ShellLM updated 2 months ago
1
amosproj/amos2024ss08-cloud-native-llm #81

Implement Quantitative Evaluation Script

## User story 1. As a data engineer, 2. I want / need to implement and automate the calculation of key performance metrics 3. So that we can iteratively evaluate the performance of our LLM in answerin…

grayJiaaoLi updated 1 day ago
2

上一页 1...1 2 3 4 5 6 7...100 下一页

1000+ results for llm-evaluation

1000+ results
for llm-evaluation