JohnSnowLabs / langtest

Deliver safe & effective language models
http://langtest.org/
Apache License 2.0
502 stars 40 forks source link

Explore MS promptbench #931

Open dcecchini opened 10 months ago

dcecchini commented 10 months ago

Explore the new tool released by Microsoft for evaluation of LLMs.

Brief description:

It consists of a wide range of LLMs and evaluation datasets, covering diverse tasks, evaluation protocols, adversarial prompt attacks, and prompt engineering techniques. As a holistic library, it also supports several analysis tools for interpreting the results. It is designed in a modular fashion, allowing to build evaluation pipelines for custom projects.

So, I think we should check what are the techniques they use to evaluate the models, as well as datasets they support, tasks, and analysis tools to interpret the results.

Github link: promptbench