bakerwho / weboftruth

Torch + bash scripts for training a web of truth (literal objective incontrovertible truth, because it obviously exists)
5 stars 1 forks source link

Choice of Dataset: SVO, WikiData, FB15-237 #1

Open adarshmathew opened 4 years ago

adarshmathew commented 4 years ago

SVO: SUBJECT-VERB-OBJECT TENSOR DATA consists of a large collection of triplet (subject, verb, direct object) extracted from Wikipedia, where each member of the triplet is a single word belonging to the WordNet lexicon (http://wordnet.princeton.edu):a noun for subject or direct object and a verb for the last member. This data set can be seen as a 3-mode tensor depicting ternary relationships between nouns and verbs.

FB15-237: The FB15K dataset was introduced in Bordes et al., 2013. It is a subset of Freebase which contains about 14,951 entities with 1,345 different relations. This dataset was found to suffer from major test leakage through inverse relations and a large number of test triples can be obtained simply by inverting triples in the training set initially by Toutanova et al.. To create a dataset without this property, Toutanova et al. introduced FB15k-237 – a subset of FB15k where inverse relations are removed.

The SVO dataset is less of a knowledge graph and more of a semantic/linguistic relationship graph derived from WordNet. Is it appropriate for our task?