evaluate-llm Search Results

1000+ results
for evaluate-llm

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

All-Hands-AI/OpenHands #4544

[Evaluation]: On evaluation, log actual messages sent to LLM…

**What problem or use case are you trying to solve?** On evaluation, such as SWE-bench, we are currently logging the instruction for each individual repo, but not actually logging the queries sent …

neubig updated 1 month ago
8
yujonglee/eval #36

Add compare-based model eval

> Rather than asking an LLM for a direct evaluation (via giving a score), try giving it a reference and asking for a comparison. This helps with reducing noise.

yujonglee updated 1 year ago
2
finos/ai-readiness #30

Tracking issue for authoring of Threats and Controls

We need to start dividing up the work, authoring the various Threats / Controls. We'll use this issue to manage that work and their assignments. Each threat / control is 'ticked' when assigned to …

ColinEberhardt updated 3 weeks ago
23
geekan/MetaGPT #1530

data-interpreter needs a dataset generation function

Sometimes , ds tasks need trained data from web page , or generated by llm. data-interpreter should determine to get useful trained data from webpage, or generate useful data by it self. I mean i…

CoderYiFei updated 1 month ago
1
sabszh/EER-chatbot-UI #7

How do RAGs / LLMs understand time?

It seems like an important aspect of what we're interested in has to do with time. - Things happen before other things that have a causal effect. Does the LLM understand this? - When you give it a…

lightnin updated 7 months ago
3
openai/human-eval #45

Evaluation doesn't work on Windows

After getting a score of 0 every time, I looked at the samples.jsonl_results.jsonl file and the result for each is this: "failed: module 'signal' has no attribute 'setitimer'" This seems like a Wi…

peter-ch updated 2 months ago
4
open-compass/opencompass #480

Gaokao and some datasets appear many zero when I evaluate th…

### Prerequisite - [X] I have searched [Issues](https://github.com/open-compass/opencompass/issues/) and [Discussions](https://github.com/open-compass/opencompass/discussions) but cannot get the ex…

kkwhale7 updated 1 year ago
15
ollama/ollama #5495

The quality of the results returned by the embedding model …

### What is the issue? The quality of the results returned by the embedding model now is much worse than the previous version. ### OS Linux ### GPU Nvidia ### CPU Intel ### Ollama version 0.1…

wwjCMP updated 3 weeks ago
5
vinid/safety-tuned-llamas #3

Prompt and seed instructions for malicious instruction gener…

In the paper in appendix B.2, you briefly describe how you generate the malicious instructions dataset. Could you share the prompt and seed instructions you used to generate this dataset? And how did …

RobertKirk updated 9 months ago
1
DS4SD/docling #74

docling vs GROBID

### Issue: Comparing GROBID and Docling for Parsing Scholarly Publications #### **My Use Case** We need to parse and extract all relevant information from (1000s) of scholarly publications, such…

sdspieg updated 1 month ago
4

上一页 1...86 87 88 89 90 91 92...100 下一页

1000+ results for evaluate-llm

1000+ results
for evaluate-llm