Using pre-trained SimplE Wikidata5m model files

DeepGraphLearning / graphvite

GraphVite: A General and High-performance Graph Embedding System

https://graphvite.io

Apache License 2.0

1.21k stars 151 forks source link

Using pre-trained SimplE Wikidata5m model files #74

Closed kyteinsky closed 3 years ago

kyteinsky commented 3 years ago

Hello,

I just saw the docs and github pages but I couldn't find how to get all connected nodes in the knowledge graph for one particular node. I am new to this arena and GraphVite, so can you explain the difference between model and app?

Like this model object

with open("simple_wikidata5m.pkl", "rb") as fin:
    model = pickle.load(fin)

and the app object in the Colab tutorial provided.

And lastly how to get all nodes connected to a particular entity in the graph from the model object.

Thanks!!

KiddoZhu commented 3 years ago

They almost refer to the same thing, except that model is a non-executable dump of the parameters in the application. To turn a model into an application, create an application with the same hyperparameters, and then use app.load_model(model).

The model dump only contains the embeddings and doesn't contain the original graph. You can manually parse the file gv.dataset.wikidata5m.train to get entity neighbors. See here for the format of Wikidata5m.

kyteinsky commented 3 years ago

Thanks a lot for the clarification.

The code that I ran just now:

import pickle
import graphvite.application as gap

with open("transe_wikidata5m.pkl", "rb") as fin:
    model = pickle.load(fin)
app = gap.KnowledgeGraphApplication(dim=512)
app.load_model(model)

And here it starts printing tons of IDs (Qxxxxxxxx), is that normal?

Does app have that parsing functions for the graph for obtaining entity neighbours?

KiddoZhu commented 3 years ago

No need to load the model back to an application if you don't want to further finetune the model. You can directly access model.solver.entity_embeddings or model.solver.relation_embeddings.

app doesn't have a direct interface to the graph. You need to parse the original training file to manipulate the graph structure.

For example, a minimal parsing code may look like

triplets = []
with open(gv.dataset.wikidata5m.train, "r") as fin:
    for line in fin:
        triplets.append(line.strip().split("\t"))

The entities and relations are encoded in as "Qxxxx" and "Pxxxx" respectively. To access their corresponding embeddings, please use model.graph.entity2id and model.graph.relation2id to convert them to indexes.