sumtool

A toolkit for understanding factuality & consistency errors in summarization models.

A harness for generating text summaries with automated factuality evaluations
- NLI (textual entailment)
- Question answering
- Other metrics (BERT-Score, Rouge Score, etc.)
An interactive query interface for exploring generated summaries (i.e. XSum or custom dataset)
- Search for common factuality errors across your dataset (i.e. find all numerical errors)
- Explore faithfulness & factuality annotations (if available)
An interactive query interface for ngram lookup
- search for a ngram query from the dataset

References

Setup (python 3.8):

pip install -r requirements.txt
pip install .

streamlit run interface/app.py

You can also run interfaces individually, i.e.

streamlit run interface/summary_interface.py

Setup (python 3.8):

pip install -r requirements.dev.txt
pip install -Ue .

Before commiting:

black sumtool/ interface/ scripts/
flake8 sumtool/ interface/ scripts/

Create a Github token to access your private repositories. Follow these steps here: Github: Creating a Personal Access Token
Create a new Colab notebook and set the runtime type to GPU

Add the following commands in the first cell to clone the repository and install the requirements

!git clone https://[your-git-token]@github.com/cs6741/summary-analysis.git
!pip install -r /content/summary-analysis/requirements.txt

Add the following command to run the text generation script

!python /content/generate_xsum_summary.py --bbc_ids [idx1,idx2] --data_split [train|test]

Pipeline for storage:

Store generated summaries
- by generating them using a custom model (example)
- by loading them from an external dataset/paper (example)
Compute summary metrics for stored summaries using sumtool.

<document_id>: 
    summary: the generated summary,
    metadata: ...metadata for the generated summary, i.e. annotations / score / entropy

<document_id>: 
    ...metrics for a stored summary, i.e. rouge-score, bert-score