evaluate-llm Search Results

1000+ results
for evaluate-llm

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

compdemocracy/polis #1842

Zero shot polis report to tldr summary

This issue is a feature! Paste append the raw text (copy and paste) of any automatically generated polis after this prompt. ![sji-tldr](https://github.com/user-attachments/assets/0d1237bb-1267-481…

colinmegill updated 1 week ago
5
TechLabs-Berlin/project_proposals #18

teachMy - AI Chatbot by Oula Suliman

### Which track are you doing? DL ### What's the problem you are trying to solve? You have a material that's hard to understand? A difficult subject at school, a legal contract or a medical docum…

valiantone updated 10 months ago
16
THUDM/Android-Lab #2

Any stable evaluation metrics

Hi! Thanks for the great work on OS-agent. I noticed the evaluation was done by llms automatically. And is there any task can be evaluated without llms, that is stable evaluation.

ReinholdM updated 1 week ago
1
andrewyng/translation-agent #27

Related Work on Improving the Translation Agent

Hello Andrew, thank you for your excellent contributions to the translation agent! I am reaching out to highlight our recent work, "**TEaR: Improving LLM-based Machine Translation with Systematic S…

yanzhangnlp updated 5 months ago
5
d3rp3tt3/AI-Trivia-ChatBot-Project-3 #34

Add tests for accuracy of the bot

## Overview We need a way to evaluate how accurate the chatbot workflow is. We will focus on how well the LLM agent is following its coded prompt instructions. ## Details * Create a small set of test…

d3rp3tt3 updated 10 hours ago
1
mlpc-ucsd/BLIVA #24

evaluate code doesn't exist

mingtouyizu updated 4 months ago
5
znzjugod/live-old-learn-old #3

rag

1. llmware-ai/[llmware](https://github.com/llmware-ai/llmware): Unified framework for building enterprise RAG pipelines with small, specialized models (github.com) 2. https:[/](https://github.com/ll…

znzjugod updated 6 months ago
1
eturchenkov/hayloft #1

feat: integrating with llama_index and its evals from ragas

So I'm the co-maintainer of [ragas](https://github.com/explodinggradients/ragas) an evaluation tool for LLM pipelines and a contributor to llama_index, langchain and a few ml tools. We have been seein…

jjmachan updated 1 year ago
3
Significant-Gravitas/AutoGPT #4107

Implement structured planning & evaluation

* #6964 ## Discussion Before we can start managing the workflow of the agent, we have to give it some more structure. The different ways to implement this can generally be subdivided into 3 groups…

Boostrix updated 5 months ago
4
FlagOpen/FlagEmbedding #855

long-llm run for more than 1 epoch

If we following the script setting of long-llm, the parameter num_train_epoch is set to 1, it will give out really significant improvment over the original model. However, if we change the paramter to…

disperaller updated 4 months ago
5

上一页 1...79 80 81 82 83 84 85...100 下一页

1000+ results for evaluate-llm

1000+ results
for evaluate-llm