JohnSnowLabs / langtest

Deliver safe & effective language models
http://langtest.org/
Apache License 2.0
502 stars 40 forks source link

Two layer evaluation #918

Closed Prikshit7766 closed 10 months ago

Prikshit7766 commented 11 months ago

Description

Robustness testing aims to evaluate the ability of a model to maintain consistent performance when faced with various perturbations or modifications in the input data. For LLMs, this involves understanding how changes in capitalization, punctuation, typos, contractions, and contextual information affect their prediction performance.

Two-layer method where the comparison between the expected_result and actual_result is conducted

two_layer_evaluation

This dual-layered approach enhances the robustness of our evaluation metric, allowing for adaptability in scenarios where direct comparisons may fall short.


➤ Fixes # (issue)

Type of change

Please delete options that are not relevant.

Usage

Checklist:

Screenshots (if appropriate):