Explore the new tool released by Microsoft for evaluation of LLMs.
Brief description:
It consists of a wide range of LLMs and evaluation datasets, covering diverse tasks, evaluation protocols, adversarial prompt attacks, and prompt engineering techniques. As a holistic library, it also supports several analysis tools for interpreting the results. It is designed in a modular fashion, allowing to build evaluation pipelines for custom projects.
So, I think we should check what are the techniques they use to evaluate the models, as well as datasets they support, tasks, and analysis tools to interpret the results.
Explore the new tool released by Microsoft for evaluation of LLMs.
Brief description:
So, I think we should check what are the techniques they use to evaluate the models, as well as datasets they support, tasks, and analysis tools to interpret the results.
Github link: promptbench