-
Title.
Benchmarks:
Evaluation Methodologies
- [x] G-Eval
- [ ] QAG
Coding Ability
- [ ] Aider Benchmark
- [ ] CodeMMLU
Confabulation Rate
- [ ] TruthfulQA
- [ ] f
Context Length
…
-
### What happened?
In CTCAE v5 some lab parameters are graded relative to baseline if baseline was abnormal (e.g., _Alanine aminotransferase increased_, CTCAE grade 1: >ULN - 3.0 x ULN if baseline wa…
-
https://github.com/louis030195/mineflayer-k8s/blob/81876f1eec3a0d04b89acef52f100f5a3266d9b7/plugins/chatToxicity.js#L10
-
### Цель
---
Провести обработку данных, полученных в результате объединения всех датасетов (Pubchem, Chembl, Toxric и т.д.):
- [ ] - избавиться от дублей
- [x] - оценить распределения расхожден…
-
Is there a way, where we can get the sentence as well along with the full results that is generated in the toxicity scanner in input scanner.
That would be really helpful to have as it would get an …
-
We might be able to find open source toxicity lists.
-
Hello,
I could use some help. We are just getting started with SonarQube and would really like to utilize this plugin. I am currently unable to get it to work. I've followed all of the Installation an…
-
The current evaluation metrics supported by `llm-eval` are robust. However, upon reviewing the documentation, I found that the current repo doesn't account for evaluating model toxicity. Assessing LLM…
-
the failure modes the DNA probes check for are grouped unclearly in the original setup.
* [x] set current probes to `active=False`, and name them "orig" or "paper" or similar, leaving the original …
-
Keeps getting the error message:
ERROR: Could not find a version that satisfies the requirement numpy=2.0.
It couldn't finish setting up the rtp.
Tried it on my local laptop and Anaconda cloud.…