human-judgment Search Results

1000+ results
for human-judgment

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

cadmiumcr/cadmium #33

Proposal: Evaluator and/or Benchmark repositories

## Preface Evaluating the accuracy of the output of an NLP component is a science in itself. When a new NLP algorithm, method or tool is published, it is always accompanied by benchmarks against…

rmarronnier updated 5 years ago
5
AIPHES/ACL20-Reference-Free-MT-Evaluation #5

Applying xmoverscore to novel dataset

Hello, I'm attempting to apply the xmoverscore metric to a novel dataset. I ran `main.py`, and it generated the following files, which I organized into results directories. Are the person corr…

billray0259 updated 2 years ago
1
microsoft/TypeScript #33345

Automated Migration for Breaking Changes to the Type System

Automated Migration for Breaking Changes to the Type System ## Search Terms codemod, migration, upgrade ## Suggestion semi-automated migration when there are breaking changes in the type s…

mheiber updated 5 years ago
7
lm-sys/FastChat #1898

MT-bench results are different today

Today's MT-bench results are very different from yesterday's results (same answer). The GPT-4 API seems to have changed since today all users can use the GPT-4 API (probably quantized ?)

imoneoi updated 1 year ago
6
Blackmill/book-club #153

Book suggestions

# Books for book club ## Business-related: 1. [Range: Why Generalists Triumph in a Specialized World](https://www.amazon.com.au/dp/0735214484/) by David Epstein > about inefficiency, failing …

elle updated 5 months ago
4
AkihikoWatanabe/paper_notes #990

Studying Summarization Evaluation Metrics in the Appropriate…

https://aclanthology.org/P19-1502/

AkihikoWatanabe updated 1 year ago
2
AkihikoWatanabe/paper_notes #669

METEOR: An Automatic Metric for MT Evaluation with Improved …

https://aclanthology.org/W05-0909/

AkihikoWatanabe updated 1 year ago
1
samuelmarina/is-even #158

why

just why

NextinMono updated 3 years ago
9
obophenotype/upheno #625

Causal relationships between phenotypes

## We have multiple _potential_ use cases for recording causal relationships between phenotypes: * Defining phenotypes that specify aetiology - e.g. hemolytic anemia MP:0001585 DEF: deficiency of …

dosumis updated 4 years ago
3
eugeneyan/eugeneyan-comments #85

https://eugeneyan.com/writing/llm-evaluators/

# Evaluating the Effectiveness of LLM-Evaluators (aka LLM-as-Judge) Use cases, techniques, alignment, finetuning, and critiques against LLM-evaluators. [https://eugeneyan.com/writing/llm-evaluators/…

utterances-bot updated 3 months ago
4

上一页 1...3 4 5 6 7 8 9...100 下一页

1000+ results for human-judgment

1000+ results
for human-judgment