evaluate-llm Search Results

1000+ results
for evaluate-llm

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

BookmarksGPT/BookmarksGPT #15

Benchmark different LLMs

gevou updated 10 months ago
1
AkihikoWatanabe/paper_notes #1179

The Unlocking Spell on Base LLMs: Rethinking Alignment via I…

# URL - https://arxiv.org/abs/2312.01552 # Affiliations - Bill Yuchen Lin, N/A - Abhilasha Ravichander, N/A - Ximing Lu, N/A - Nouha Dziri, N/A - Melanie Sclar, N/A - Khyathi Chandu, N/A …

AkihikoWatanabe updated 9 months ago
1
lm-sys/FastChat #3004

Add Whole History Rating to Leaderboard?

For the API-based models, there are frequent claims online that users see models getting worse over time. It would be good to know if that's true. Copying a [comment of mine from HF](https://hugging…

endolith updated 6 months ago
2
scrapinghub/article-extraction-benchmark #3

Adding more tools to the benchmark?

Hi, Thanks for your contribution, it's really useful to see evaluations on real-world data! There are further extraction tools for Python which this repository doesn't feature yet and which could b…

adbar updated 4 months ago
7
ollama/ollama #1863

Ollama stuck after few runs

I updated Ollama from 0.1.16 to 0.1.18 and encountered the issue. I am using python to use LLM models with Ollama and Langchain on Linux server(4 x A100 GPU). There are 5,000 prompts to ask and get…

jadhvank updated 3 weeks ago
97
outlines-dev/outlines #837

generate.json() gives ValidationError when run with mistral-…

### Describe the issue as clearly as possible: Example code with Pydantic and generate.json() throws a ValidationError Code is run from Jupyter Notebook Output is ok if age: int is removed from t…

Dodorotata updated 3 months ago
9
EleutherAI/lm-evaluation-harness #1719

When using Accelerate for data parallel inference, using dif…

Hi, @haileyschoelkopf Thank you for your awsome open-source work. We have been evaluating using `lm-eval` and noticed that when using `accelerate` for data parallel inference, the number of GPUs utili…

s1ghhh updated 3 months ago
4
BirgerMoell/swedish-medical-benchmark #1

Evaluate benchmarks from BioMistral to decide which ones are…

https://huggingface.co/datasets/BioMistral/BioInstructQA ![Screenshot 2024-04-03 at 22 32 34](https://github.com/BirgerMoell/swedish-medical-benchmark/assets/1704131/d3eefcb9-cd8a-4983-81c4-fbc00d320…

BirgerMoell updated 5 months ago
5
EleutherAI/lm-evaluation-harness #1675

Support for sequence tagging tasks

We are trying to evaluate Named Entity Recognition and Part of Speech tagging tasks, but it is unclear to us how to do that. We've noticed that `aclue` include a Named Entity Recognition task but it …

Khalid-Nabigh updated 4 months ago
2
lm-sys/FastChat #3505

Why logistic regression is equivalent to Bradley-Terry model…

Dear maintainers, Thank you for your valuable arena. I am currently researching the way of LLMs evaluation and got stack with a question about Bradley-Terry model. As it stands, from multiple sou…

VityaVitalich updated 5 days ago
1

上一页 1...94 95 96 97 98 99 100...100 下一页

1000+ results for evaluate-llm

1000+ results
for evaluate-llm