awslabs / dgl-ke

High performance, easy-to-use, and scalable package for learning large-scale knowledge graph embeddings.
https://dglke.dgl.ai/doc/
Apache License 2.0
1.28k stars 196 forks source link

Option to specify input format using column indices #107

Open schelv opened 4 years ago

schelv commented 4 years ago

Allow to directly specify the relevant column indices of the input files (e.g. triplets_column_indices=[1, 0, 2]): Now you have to specify the format htr, rht, etc. which is converted internally with _parse_srd_format to [0,1,2], [1,0,2], etc. The advantage of specifying this directly is that it would also allow input files with unused columns (such as qualifiers or sources).

It would also be great if this is possible for the id mapping files. The dataset that I want to use has the columns: property_id, en_label, en_description. This cannot be loaded with the code from this pull request, since the label and id are in the wrong order, and there is an unused column. Specifying something like relations_map_column_indices=[1,0] would be very convenient.

classicsong commented 4 years ago

This can be a good point. We will provide python APIs in 0.2.0 release, at that time user can define their own Dataset loader.