python train_factuality_clf.py \
--train_data_filepath data/xent-probs/train-probs.json \
--test_data_filepath data/xent-probs/test.json \
--factuality-classifiers/knn-20n.pickle \
--n_neighbors 20
python generate_fbs_summaries.py --test_size 100
python evaluate_summaries.py --test_size 100
python iterative_constraints.py \
--data_subset full \
--batch_size 16 \
--classifier_batch_size 16 \
--max_iterations 100 \
--pickled_classifier factuality-classifiers/v2-knn-20n.pickle
python iterative_constraints.py \
--data_subset full \
--batch_size 16 \
--classifier_batch_size 16 \
--max_iterations 100 \
--pickled_classifier factuality-classifiers/v2-knn-20n.pickle \
--model_summarization google/pegasus-xsum
python compute_rouge_scores.py
conda create -n factual-beam-search python=3.8
conda activate factual-beam-search
pip install -r requirements.txt
pytests tests
python iterative_constraints.py --data_subset test|debug --batch_size 4 --verbose 1
python iterative_constraints.py --data_subset test|debug --batch_size 4 --verbose 1 --pickled_classifier factuality-classifiers/v0-knn.pickle
python annotate_summaries.py --test_size 100
streamlit run app.py
To create dataset of prior and posterior probabilities of XEnt named entities, run compute_probs.py. For example, to run on train
python compute_probs.py \
--xent_split train \
--output_filepath data/xent-probs/train.json
As a proof of concept of a non-oracle named entity factuality classifier, we train a non-parameteric model on prior and posterior probability, and whether the name entitity overlaps with the source, to serve as a discriminator between factual and non-factual entity.
To train, pickle and test this classification, run the following:
$ python train_factuality_clf.py \
--train_data_filepath data/xent-probs/train.json \
--test_data_filepath data/xent-probs/test.json \
--pickled_clf_path optional/path/to/newly/train/knn.pickle