Previous models for the assessment of commitment towards a predicate in a sentence (also known as factuality prediction) were trained and tested against a specific annotated dataset, subsequently limiting the generality of their results. In this work we propose an intuitive method for mapping three previously annotated corpora onto a single factuality scale, thereby enabling models to be tested across these corpora. In addition, we design a novel model for factuality prediction by first extending a previous rule-based factuality prediction system and applying it over an abstraction of dependency trees, and then using the output of this system in a supervised classifier.
In this repository you'll find both the converted corpus, as well as our factuality prediction model.
If you use this resource, please cite the following paper:
@InProceedings{stanovsky2017fact,
author = {Stanovsky, Gabriel and Eckle-Kohler, Judith and Puzikov, Yevgeniy and Dagan, Ido and Gurevych, Iryna},
title = {Integrating Deep Linguistic Features in Factuality Prediction over Unified Datasets},
booktitle = {Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL 2017)},
month = {August},
year = {2017},
address = {Vancouver, Canada}
}
Try a live demonstration by heading over to our Online Demo Page
Make sure that the JAVA_HOME variable is set accordingly.
E.g., JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/
pip install nltk
python -c "import nltk;nltk.download('wordnet')"
pip install spacy
python -m spacy download en
For obtaining a snapshot of the unified dataset, please contact us.
From src
:
./scripts/download_external_corpora.sh
NOTE: FactBank should be downloaded separately. Please login to LDC, download the corpus, and place it in the directory factbank_v1
under /data/external_annotations/
.
Install converter
./scripts/install_converter.sh
Convert to a unified representation:
./scripts/convert_corpora.sh
The converted unified corpus should be created in the unified corpus directory.
Each line corresponds to a word in the sentence, where the following values appear tab separated:
An empty line separates between sentences.
Additional values may appear in tabs, depending on the input format
For example, the unified corpus contains dependency parsing, and the automatic tools appends TruthTeller
features (see Interactive Usage Examples).
From src
, run:
./scripts/install_annotator.sh
Start servers:
Start the spaCy server:
Run ./scripts/run_spacy_server.sh
This will open a server listening on port 8081 by default.
Wait for the ENGINE Bus STARTED
message to appear, indicating that the server is up.
In a new terminal, start the PropS server:
Run ./scripts/run_props_server.sh
This will open a server listening on port 10345.
Wait for the Listening on http://:8081/
message to appear, indicating that the server is up.
Run client application:
./scripts/annotate_factuality.sh
This will wait for input on STDIN and will output sentences with CoNLL factuality annotations
to STDOUT.
NOTE: You can also run these scripts using different hosts and ports. See the scripts above for instructions on how to do this.
echo "John refused to go" | ./scripts/annotate_factuality.sh
0 John _ _ P _ _
1 refused 3.0 -/?NoF P P P
2 to _ _ _ _ _
3 go -3.0 +/-NoF P N N
cat ../examples/example_sentences.txt | ./scripts/annotate_factuality.sh > ../examples/example_sentences.fact.conll
Output can be seen in the CoNLL file.
gabriel (dot) satanovsky (at) gmail (dot) com