-
## タイトル: WalledEval:大規模言語モデルのための包括的安全評価ツールキット
## リンク: https://arxiv.org/abs/2408.03837
## 概要:
WalledEvalは、大規模言語モデル(LLM)の安全性評価のために設計された包括的なAI安全性テストツールキットです。オープンウェイトモデルとAPIベースモデルの両方を含む、多様なモデルに対応し、多言…
-
See [log](https://dart-ci.appspot.com/log/vm-kernel-linux-debug-x64/dartk-weak-asserts-linux-debug-x64/21449/co19/LanguageFeatures/nnbd/const_evaluation_A10_t01):
```
Unhandled exception:
Expect.id…
-
# Main todos:
- [ ] Check whether all move sorting table are correctly allocated, cleared or scaled, prepare some tests for them. Maybe clearing is not necessary
- [ ] Prepare and test better coef …
-
It complains about missing dll files: `libwinpthread-1`, `libstdc++-6`, `zlib1`, `libzstd`
Also having Windows build packaged with tar.gz seems kind of weird, usually files for Windows is packaged …
-
## Description
Users express the need for data schema evaluation to enable "fail-fast" capabilities during data loading and consistency checks before execution. They highlight the potential benefits …
-
**Is your feature request related to a problem? Please describe.**
the old platform supported string-driven templates.
I'm wondering if you all will be supporting that indefinitely-- mainly becau…
-
Not sure if this exists, but a dataset which has a food services' health/safety scores would be interesting to work with.
Example: https://data.cityofchicago.org/Health-Human-Services/Food-Inspect…
-
Here are some ideas and potential areas of research for Tensort:
- Model analysis and interpretability: Develop new techniques for analyzing and understanding what large language models have learned …
-
### Contact Details
vicente.herrera@control-plane.io
### What is the idea
During a working session we have started to talk about responsible AI, something we look forwar to start getting into…
-
The current evaluation metrics supported by `llm-eval` are robust. However, upon reviewing the documentation, I found that the current repo doesn't account for evaluating model toxicity. Assessing LLM…