dmlc / dgl

Python package built to ease deep learning on graph, on top of existing DL frameworks.
http://dgl.ai
Apache License 2.0
13.34k stars 3k forks source link

[Feature] Provide dgl.data.GraphLoader APIs to simplify loading data from disk and creating DGLGraph. #2088

Closed classicsong closed 4 years ago

classicsong commented 4 years ago

🚀 Feature

DGL Data GraphLoader will provide easy-to-use APIs for users to load their own dataset as DGLGraph as well as node/edge features by default.

We will provide a set of APIs to parse the raw data from CSV style files.

Motivation

An example user experience:

user_loader = dgl.data.NodeFeatureLoader(input='u.user',
                                          separator="|",
                                          int_id=False)
user_loader.addCategoryFeature(cols=["id", "gender"], node_type='user')
user_loader.addWord2VecFeature(cols=["id", "occupation"], node_type='user')
user_loader.addNumericalBucketFeature(cols=["id", "age"], 
                                       range=[0, 100],
                                       bucket_cnt=10,
                                       slide_window_size=5,
                                       node_type='user')
movie_loader = dgl.data.NodeFeatureLoader(input='u.item',
                                            separator="|")
movie_loader.addMultiHotFeature(cols=["id", "Action", "Adventure", "Animation", "Children", "Comedy", "Crime", "Documentary", "Drama", "Fantasy", "Film-Noir", "Horror", "Musical", "Mystery", "Romance", "Sci-Fi", "Thriller", "War", "Western"],
                                norm=True,
                                node_type='movie')
movie_loader.addWord2VecFeature(cols=["id", "title"],
                                language=['en_lang', 'fr_lang'],
                                node_type='movie')

# label loader
train_label_loader = dgl.data.NodeLabelLoader(input='u1.base', 
                                                    separator="\t",
                                                    multi_label=False)
train_label_loader.addSet(cols=["user", "movie", "rating"],
                          edge_type=['user', 'movie'],
                          split_rate=[.8, .2, 0])
test_label_loader = dgl.data.NodeLabelLoader(input='u1.test', 
                                                    separator="\t",
                                                    multi_label=False)
test_label_loader.addTestSet(cols=["user", "movie", "rating"],
                             edge_type=['user', 'movie'])

# load graph
graphloader = dgl.data.GraphLoader()
graphloader.appendFeature(movie_loader)
graphloader.appendFeature(user_loader)
graphloader.appendLabel(train_label_loader)
graphloader.appendLabel(test_label_loader)
graphloader.addReverseEdges()

# loading the whole graph
graphloader.process()

g = graphloader.graph

Alternatives

Support loading data from Pandas object.

classicsong commented 4 years ago

We have move it into https://github.com/classicsong/dgl-graphloader