-
## Preface
Evaluating the accuracy of the output of an NLP component is a science in itself.
When a new NLP algorithm, method or tool is published, it is always accompanied by benchmarks against…
-
Hello, I'm attempting to apply the xmoverscore metric to a novel dataset.
I ran `main.py`, and it generated the following files, which I organized into results directories.
Are the person corr…
-
Automated Migration for Breaking Changes to the Type System
## Search Terms
codemod, migration, upgrade
## Suggestion
semi-automated migration when there are breaking changes in the type s…
-
Today's MT-bench results are very different from yesterday's results (same answer). The GPT-4 API seems to have changed since today all users can use the GPT-4 API (probably quantized ?)
-
# Books for book club
## Business-related:
1. [Range: Why Generalists Triumph in a Specialized World](https://www.amazon.com.au/dp/0735214484/) by David Epstein
> about inefficiency, failing …
elle updated
5 months ago
-
https://aclanthology.org/P19-1502/
-
https://aclanthology.org/W05-0909/
-
just why
-
## We have multiple _potential_ use cases for recording causal relationships between phenotypes:
* Defining phenotypes that specify aetiology - e.g. hemolytic anemia MP:0001585 DEF: deficiency of …
-
# Evaluating the Effectiveness of LLM-Evaluators (aka LLM-as-Judge)
Use cases, techniques, alignment, finetuning, and critiques against LLM-evaluators.
[https://eugeneyan.com/writing/llm-evaluators/…