Fix multiple issues with user defined data

classicsong commented 4 years ago

This PR fix several bugs and gives some enhancement of supporting user defined data

[x] [BUG] #85 , Force user to provide dataset name when using udd or raw_udd. (This avoid set 'FB15k' as dataset name of user defined data.
[x] [Enhance], Add check for udd input
[x] [Doc] #99 , better doc about udd
[x] [Enhance] #97 , allow users to specify the delimiter

schelv commented 4 years ago

Hi, I have two suggestions/questions that would make it easier to train using a User-Defined knowledge graph.

from the new documentation:

Here we assume the both the entities ids and relation ids start from 0 and should be contineous

Is it possible to drop this assumption? I have a dataset that already has a mapping of entity names to ids, and where the ids are not continuous. Otherwise I would have to create another mapping from the old id to the new id..? .. or use the raw_udd option and let the program do this automatically. Both options do not seem practical.

2. I've described the idea in #107

Thanks for the awesome repository!

classicsong commented 4 years ago

For the first point. Currently we require the node id to be continuous that the trained embedding will not contains untrained data. This assumption can be released if we all used defined dataloader. Because user needs to provide what is the max node ID.

Currently I suggest you using raw_udd

awslabs / dgl-ke

Fix multiple issues with user defined data #105