EdinburghNLP / awesome-hallucination-detection

List of papers on hallucination detection in LLMs.
Apache License 2.0
684 stars 57 forks source link
hallucinations llms nlp

awesome-hallucination-detection

Awesome License: Apache 2.0

Citing this repository

@misc{MinerviniAHD2024,
  author = {Pasquale Minervini and others},
  title = {awesome-hallucination-detection},
  year = {2024},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/EdinburghNLP/awesome-hallucination-detection}}
}

Papers and Summaries

Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering

MARS: Meaning-Aware Response Scoring for Uncertainty Estimation in Generative LLMs

Do Not Design, Learn: A Trainable Scoring Function for Uncertainty Estimation in Generative LLMs

Quantifying Uncertainty in Answers from any Language Model and Enhancing their Trustworthiness

DeCoRe: Decoding by Contrasting Retrieval Heads to Mitigate Hallucinations

Leveraging Hallucinations to Reduce Manual Prompt Dependency in Promptable Segmentation

GraphEval: A Knowledge-Graph Based LLM Hallucination Evaluation Framework

Lynx: An Open Source Hallucination Evaluation Model

LLMs hallucinate graphs too: a structural perspective

HallusionBench: An Advanced Diagnostic Suite for Entangled Language Hallucination and Visual Illusion in Large Vision-Language Models

Unified Hallucination Detection for Multimodal Large Language Models

FactCHD: Benchmarking Fact-Conflicting Hallucination Detection

Attention Satisfies: A Constraint-Satisfaction Lens on Factual Errors of Language Models

TRUE: Re-Evaluating Factual Consistency Evaluation

TrueTeacher: Learning Factual Consistency Evaluation with Large Language Models

SAC$^3$: Reliable Hallucination Detection in Black-Box Language Models via Semantic-aware Cross-check Consistency

Elastic Weight Removal for Faithful and Abstractive Dialogue Generation

Trusting Your Evidence: Hallucinate Less with Context-aware Decoding

When Not to Trust Language Models: Investigating Effectiveness of Parametric and Non-Parametric Memories

Retrieval Augmentation Reduces Hallucination in Conversation

Just Ask for Calibration: Strategies for Eliciting Calibrated Confidence Scores from Language Models Fine-Tuned with Human Feedback

How Language Model Hallucinations Can Snowball

Improving Language Models with Advantage-based Offline Policy Gradients

Generating with Confidence: Uncertainty Quantification for Black-box Large Language Models

Contextualized Sequence Likelihood: Enhanced Confidence Scores for Natural Language Generation

FaithDial: A Faithful Benchmark for Information-Seeking Dialogue

Neural Path Hunter: Reducing Hallucination in Dialogue Systems via Path Grounding

HaluEval: A Large-Scale Hallucination Evaluation Benchmark

Self-contradictory Hallucinations of Large Language Models: Evaluation, Detection and Mitigation

Mitigating Language Model Hallucination with Interactive Question-Knowledge Alignment

Check Your Facts and Try Again: Improving Large Language Models with External Knowledge and Automated Feedback

SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models

The Internal State of an LLM Knows When it's Lying

Chain of Knowledge: A Framework for Grounding Large Language Models with Structured Knowledge Bases

Halo: Estimation and Reduction of Hallucinations in Open-Source Weak Large Language Models

A Stitch in Time Saves Nine: Detecting and Mitigating Hallucinations of LLMs by Validating Low-Confidence Generation

Sources of Hallucination by Large Language Models on Inference Tasks

Hallucinations in Large Multilingual Translation Models

Citation: A Key to Building Responsible and Accountable Large Language Models

Zero-Resource Hallucination Prevention for Large Language Models

RARR: Researching and Revising What Language Models Say, Using Language Models

Q²: Evaluating Factual Consistency in Knowledge-Grounded Dialogues via Question Generation and Question Answering

Do We Know What We Don’t Know? Studying Unanswerable Questions beyond SQuAD 2.0

Chain-of-Verification Reduces Hallucination in Large Language Models

Detecting and Mitigating Hallucinations in Multilingual Summarisation

Hallucinated but Factual! Inspecting the Factuality of Hallucinations in Abstractive Summarization

Enabling Large Language Models to Generate Text with Citations

A Token-level Reference-free Hallucination Detection Benchmark for Free-form Text Generation

Generating Benchmarks for Factuality Evaluation of Language Models

Do Language Models Know When They're Hallucinating References?

Why Does ChatGPT Fall Short in Providing Truthful Answers?

LM vs LM: Detecting Factual Errors via Cross Examination

RHO (ρ): Reducing Hallucination in Open-domain Dialogues with Knowledge Grounding

FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation

ExpertQA: Expert-Curated Questions and Attributed Answers

DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models

FreshLLMs: Refreshing Large Language Models with Search Engine Augmentation

Beyond Factuality: A Comprehensive Evaluation of Large Language Models as Knowledge Generators

Complex Claim Verification with Evidence Retrieved in the Wild

FELM: Benchmarking Factuality Evaluation of Large Language Models

Evaluating Hallucinations in Chinese Large Language Models

On Faithfulness and Factuality in Abstractive Summarization

QuestEval: Summarization Asks for Fact-based Evaluation

QAFactEval: Improved QA-Based Factual Consistency Evaluation for Summarization

Fast and Accurate Factual Inconsistency Detection Over Long Documents

Understanding Factuality in Abstractive Summarization with FRANK: A Benchmark for Factuality Metrics

TRUE: Re-evaluating Factual Consistency Evaluation

The Curious Case of Hallucinatory (Un)answerability: Finding Truths in the Hidden States of Over-Confident Large Language Models

Do Androids Know They're Only Dreaming of Electric Sheep?

Correction with Backtracking Reduces Hallucination in Summarization

Fine-grained Hallucination Detection and Editing for Language Models

LLMs as Factual Reasoners: Insights from Existing Benchmarks and Beyond

Evaluating the Factual Consistency of Abstractive Text Summarization

SummaC: Re-Visiting NLI-based Models for Inconsistency Detection in Summarization

On the Origin of Hallucinations in Conversational Models: Is it the Datasets or the Models?

Teaching Language Models to Hallucinate Less with Synthetic Tasks

Faithfulness-Aware Decoding Strategies for Abstractive Summarization

KL-Divergence Guided Temperature Sampling

Investigating Hallucinations in Pruned Large Language Models for Abstractive Summarization

Entity-Based Knowledge Conflicts in Question Answering

TruthX: Alleviating Hallucinations by Editing Large Language Models in Truthful Space

Question Decomposition Improves the Faithfulness of Model-Generated Reasoning

Self-contradictory Hallucinations of Large Language Models: Evaluation, Detection and Mitigation

Detecting hallucinations in large language models using semantic entropy

Domain-specific Entries

Med-HALT: Medical Domain Hallucination Test for Large Language Models

Retrieval-Based Prompt Selection for Code-Related Few-Shot Learning

Overviews, Surveys, and Shared Tasks

Taxonomy from Huang et al.

Taxonomies

Survey of Hallucination in Natural Language Generation classifies metrics in Statistical (ROUGE, BLEU, PARENT, Knowledge F1, ..) and Model-based metrics. The latter are further structured in the following classes:

A Survey of Hallucination in “Large” Foundation Models surveys papers flagging them for detection, mitigation, tasks, datasets, and evaluation metrics. Regarding hallucinations in text, it categorises papers by LLMs, Multilingual LLMs, and Domain-specific LLMs.

The Dawn After the Dark: An Empirical Study on Factuality Hallucination in Large Language Models proposed a taxonomy of different types of hallucinations: Entity-error Hallucination, Relation-error Hallucination, Incompleteness Hallucination, Outdatedness Hallucination, Overclaim Hallucination, Unverifiability Hallucination.

Internal Consistency and Self-Feedback in Large Language Models: A Survey proposed a new perspective, Internal Consistency, to approach "enhancing reasoning" and ""alleviating hallucinations". This perspective allowed us to unify many seemingly unrelated works into a single framework. To improve internal consistency (which in turn enhances reasoning ability and mitigates hallucinations), this paper identified common elements across various works and summarized them into a Self-Feedback framework.

This framework consists of three components: Self-Evaluation, Internal Consistency Signal, and Self-Update.

Measuring Hallucinations in LLMs

Open Source Models for Measuring Hallucinations

Definitions and Notes

Extrinsic and Intrinsic Hallucinations

Neural Path Hunter defines as extrinsic hallucination as an utterance that brings a new span of text that does not correspond to a valid triple in a KG, and as intrinsic hallucination as an utterance that misuses either the subject or object in a KG triple such that there is no direct path between the two entities. Survey of Hallucination in Natural Language Generation defines as extrinsic hallucination a case where the generated output that cannot be verified from the source content, and as an intrinsic hallucination a case where the generated output contradicts the source content.