DAFNA-EA is a java library of truth discovery methods from the literature to evaluate the veracity of data claimed by multiple online sources.
The methods that have been implemented for the comparative study are the following: More detail can be found here.
Real-world data sets are available here.
A dataset generator for truth discovery scenario can be donwladed here and description of the parameters are given here with a full documentation.
To cite DAFNA-EA in publications use:
For LaTeX users:
author = {Dalia Attia Waguih and Laure Berti{-}Equille},
title = {Truth Discovery Algorithms: An Experimental Evaluation},
journal = {CoRR},
volume = {abs/1409.6428},
year = {2014},
url = {http://arxiv.org/abs/1409.6428}}
To cite ensembling of truth discovery methods
For LaTeX users:
author = {Laure Berti{-}Equille},
title = {Data veracity estimation with ensembling truth discovery methods},
booktitle = {2015 {IEEE} International Conference on Big Data, Big Data 2015, Santa Clara, CA, USA, October 29 - November 1, 2015},
pages = {2628--2636},
year = {2015}}
For a survey:
For LaTeX users:
author = {Laure Berti{-}Equille and Javier Borge{-}Holthoefer},
title = {Veracity of Data: From Truth Discovery Computation Algorithms to Models of Misinformation Dynamics},
series = {Synthesis Lectures on Data Management},
publisher = {Morgan {\&} Claypool Publishers},
year = {2015}}
Two tutorials surveying truth discovery methods and the topic of data veracity are available here.
We have releasee an API so that users can test the truth discovery methods on their own. Documentation of the API is here
You can try the demos:
Make sure you have installed Java 7 and Maven on your computer first. Before the first build you need to prepare some libraries in your local repository:
mvn org.apache.maven.plugins:maven-install-plugin:2.3.1:install-file -Dfile=lib/independance-vldb2010-solomon-cleaner.jar \
-DgroupId=com.att.research -DartifactId=solomon.cleaner -Dversion=0.0.1 -Dpackaging=jar -DlocalRepositoryPath=my-repo
mvn org.apache.maven.plugins:maven-install-plugin:2.3.1:install-file -Dfile=lib/simmetrics_jar_v1_6_2_d07_02_07.jar \
-DgroupId=uk.ac.shef.wit -DartifactId=simmetrics -Dversion=1.6.2 -Dpackaging=jar -DlocalRepositoryPath=my-repo
To build a jar containing all the algorithms ready to consume for AllegatorTrack UI and API:
mvn clean # cleans previously created jar
mvn package # builds everything
or simply
mvn clean package
This will build a jar located in target
folder. Just copy it to AllegatorTrack:
cp target/DAFNA-EA-1.0-jar-with-dependencies.jar <AllegatorTrack-root>/vendor
Import the project as a Maven project then build it normally (recommended).
Alternatively, import as Java project and set the classpath
manually to include all dependancies listed in pom.xml
.
java -jar <JAR_PATH> <ALGORITHM_NAME> <DATASETS_CLAIMS_DIR> <DATASETS_GROUND_DIR> <OUTPUT_DIR> <ALGORITHM_PARAMS>
Where <JAR_PATH>
points to the generated jar file in the build section. <ALGORITHM_NAME>
stands for algorithm name,
which can be one of the following:
Cosine
, 2-Estimates
, 3-Estimates
, Depen
, Accu
, AccuSim
, AccuNoDep
, TruthFinder
, SimpleLCA
, GuessLCA
, MLE
or LTM
.
<DATASETS_CLAIMS_DIR>
, <DATASETS_GROUND_DIR>
and <OUTPUT_DIR>
point to directories where CSV claim, ground files and the directory where all output files
should be generated, respectively.
<ALGORITHM_PARAMS>
is a white-space separated values and are dependant on the algorithm selected.
In all cases, general parameters come first followed by specific parameters.
Details of parameters for each algorithms can be found here.
There are 3 possible patterns for the <ALGORITHM_PARAMS>
:
4 General then specific parameters.
4 General parameters are all set to 0, followed by the number of algorithms to be combined. Next comes a number of file paths pointing to claim results generated from the corresponding algorithm before calling the combiner. Example:
java -jar <JAR_PATH> <ALGORITHM_NAME> <DATASETS_CLAIMS_DIR> <DATASETS_GROUND_DIR> <OUTPUT_DIR> 0 0 0 0 3 results1.csv results2.csv results3.csv
5 extra parameters are added at the end:
java -jar <JAR_PATH> <ALGORITHM_NAME> <DATASETS_CLAIMS_DIR> <DATASETS_GROUND_DIR> <OUTPUT_DIR> <ALGORITHM_PARAMS> <RUN_ID> <CLAIM_ID> <CLAIM_RESULTS_FILE> <SOURCE_TRUSTWORTHINESS_FILE> Allegate
Where <ALGORITHM_PARAMS>
is the same as in normal invocation, <RUN_ID>
and <CLAIM_ID>
denote the run id and the claim id being allegated, respectively.
These can be anything and are only used for convenience to generate meaningful file names in the output.
<CLAIM_RESULTS_FILE>
and <SOURCE_TRUSTWORTHINESS_FILE>
point to results generatd by the run being allegated.
Allegate
should be put as is.