weboftruth is a project to use deep representation learning to learn fact embeddings, with applications to social science and disinformation.
Formally, a Knowledge Graph consists of a number of facts where each fact is a relation
edge connecting a head
entity and a tail
entity. Existing packages like torch-kge
allow you to learn vector representations for entities and relationships given their context in the training data.
weboftruth adds value by:
This repo contains Python and bash scripts for training Knowledge Graph embeddings using head-relation-tail triples.
Two embedding spaces are created (one for Entities (Subjects/Objects) and one for Relationships/Verbs). Extensive use is made of the package torchkge
that implements KGE algorithms like TransE. It is built on PyTorch.
Clone this repo - a useful set of standard Knowledge Graph datasets compiled by Github user simonepri
(many thanks)
OR
If using your own dataset, organize it as follows:
head\trelation\ttail
India\tlocatedIn\tAsia
{dataset_name}
at a location {datapath}
train
, test
, and validation
sets{datapath}/{dataset_name}/edges_as_text_train.tsv
, edges_as_text_test.tsv
and edges_as_text_valid.tsv
respectivelyRun a command such as the one below. Customize as required.
python ./weboftruth/weboftruth/wotmodels.py \
-e 200 \
-m 'TransE' \
-lr 0.00005 \
-dp ./datasets-knowledge-embedding \
-ds 'KINSHIP' \
-mp ./weboftruth/models \
-ts 80
Flag meanings:
e
: number of epochsm
: model name, as 'TransE', 'DistMult' or any of the others provided by torchkgelr
: learning ratedp
: datapath, path to the directory containing your datasetsds
: dataset name, this should be a subdirectory of dp
mp
: modelpath, path to the directory containing your saved modelsts
: truth-share, a parameter which causes (100-ts)
% of the training set is corrupted before learning beginsThis was part of a coursework for CAPP 30255 at the University of Chicago by Aabir Abubaker Kar Adarsh Mathew
data
contains data sources used, specifically the SVO dataset of subject-verb-object triples from Wikipedia, and our constructed 'partially true' datasets
logs
execution logs for RCC and AWS runs
notebooks
Jupyter notebooks used for early prototyping
shscripts
shell scripts used for RCC
weboftruth
Python code organized into a package. Borrows heavily from torch and torchkge