Explore MS promptbench - Githubissues

Explore the new tool released by Microsoft for evaluation of LLMs.

Brief description:

It consists of a wide range of LLMs and evaluation datasets, covering diverse tasks, evaluation protocols, adversarial prompt attacks, and prompt engineering techniques. As a holistic library, it also supports several analysis tools for interpreting the results. It is designed in a modular fashion, allowing to build evaluation pipelines for custom projects.

So, I think we should check what are the techniques they use to evaluate the models, as well as datasets they support, tasks, and analysis tools to interpret the results.

Github link: promptbench

JohnSnowLabs / langtest

Explore MS promptbench #931