AI-secure / DecodingTrust

A Comprehensive Assessment of Trustworthiness in GPT Models
https://decodingtrust.github.io/
Creative Commons Attribution Share Alike 4.0 International
232 stars 52 forks source link

Difference to HELM benchmark #1

Closed ogencoglu closed 11 months ago

ogencoglu commented 1 year ago

Thanks for the work.

Would be great if the difference of your work and HELM benchmarks can be mentioned in README somewhere. There seems to be lots of overlaps at first glance.

chenweixin107 commented 11 months ago

Thanks for the question. Overall, the DecodingTrust focuses on the comprehensive trustworthiness evaluations for LLMs, while HELM mainly focuses on comprehensive benign evaluation scenarios.

The detailed differences lie in three folds:

  1. Considered perspectives (Comprehensive/holistic evaluation vs. trustworthiness evaluation) HELM and DecodingTrust have some overlapping perspectives, such as toxicity, bias, robustness, and fairness. However, since HELM lays emphasis on comprehensiveness, it includes perspectives like calibration and efficiency, which is not considered in DecodingTrust. And since DecodingTrust lays emphasis on trustworthiness, it includes as many trustworthiness perspectives as possible, and includes perspectives like ethics and privacy, which is not considered in HELM.

  2. For each overlapping perspective, we also have a different focus from HELM. Taking “robustness” as an example, HELM studies robustness against (a) small semantics-preserving perturbations and (b) semantics-altering perturbations. In contrast, DecodingTrust considers three high-level robustness—(a) adversarial robustness, (b) OOD robustness, (c) robustness against adversarial demonstrations. And under each type of robustness, we further adopt various kinds of perturbations. (e.g., for (c), we study robustness against counterfactual demonstrations/spurious correlations in demonstrations/backdoored demonstrations)

  3. Findings As a very concise summarization, DecodingTrust reveals (a) the performance of GPT models under different trustworthiness perspectives, and (b) the resilience of their performance in adversarial environments (e.g., adversarial system/user prompts, demonstrations). However, the second aspect hasn’t been well explored in HELM.