bakerwho / weboftruth

Torch + bash scripts for training a web of truth (literal objective incontrovertible truth, because it obviously exists)
5 stars 1 forks source link

weboftruth

weboftruth is a project to use deep representation learning to learn fact embeddings, with applications to social science and disinformation.

Formally, a Knowledge Graph consists of a number of facts where each fact is a relation edge connecting a head entity and a tail entity. Existing packages like torch-kge allow you to learn vector representations for entities and relationships given their context in the training data.

weboftruth adds value by:

This repo contains Python and bash scripts for training Knowledge Graph embeddings using head-relation-tail triples.

Two embedding spaces are created (one for Entities (Subjects/Objects) and one for Relationships/Verbs). Extensive use is made of the package torchkge that implements KGE algorithms like TransE. It is built on PyTorch.

How to use weboftruth

  1. Clone this repo - a useful set of standard Knowledge Graph datasets compiled by Github user simonepri (many thanks)

    OR

  2. If using your own dataset, organize it as follows:

    • Ensure your Knowledge Graph dataset has a finite set of discrete entities and a finite set of discrete relationships as a Tab-Separated-Value file.
      • Format: head\trelation\ttail
      • Example line: India\tlocatedIn\tAsia
      • DO NOT include a header line
    • Create a folder {dataset_name} at a location {datapath}
    • Split your KG into train, test, and validation sets
    • Write them to {datapath}/{dataset_name}/edges_as_text_train.tsv, edges_as_text_test.tsv and edges_as_text_valid.tsv respectively
  3. Run a command such as the one below. Customize as required.

    python ./weboftruth/weboftruth/wotmodels.py \
        -e 200 \
        -m 'TransE' \
        -lr 0.00005 \
        -dp ./datasets-knowledge-embedding \
        -ds 'KINSHIP' \
        -mp ./weboftruth/models \
        -ts 80

Flag meanings:

Experiment results:

This was part of a coursework for CAPP 30255 at the University of Chicago by Aabir Abubaker Kar Adarsh Mathew

Filestructure: