awslabs / dgl-ke

High performance, easy-to-use, and scalable package for learning large-scale knowledge graph embeddings.
https://dglke.dgl.ai/doc/
Apache License 2.0
1.28k stars 196 forks source link

Fix multiple issues with user defined data #105

Closed classicsong closed 4 years ago

classicsong commented 4 years ago

This PR fix several bugs and gives some enhancement of supporting user defined data

schelv commented 4 years ago

Hi, I have two suggestions/questions that would make it easier to train using a User-Defined knowledge graph.

  1. from the new documentation:

    Here we assume the both the entities ids and relation ids start from 0 and should be contineous

Is it possible to drop this assumption? I have a dataset that already has a mapping of entity names to ids, and where the ids are not continuous. Otherwise I would have to create another mapping from the old id to the new id..? .. or use the raw_udd option and let the program do this automatically. Both options do not seem practical.

2. I've described the idea in #107

Thanks for the awesome repository!

classicsong commented 4 years ago

For the first point. Currently we require the node id to be continuous that the trained embedding will not contains untrained data. This assumption can be released if we all used defined dataloader. Because user needs to provide what is the max node ID.

Currently I suggest you using raw_udd