Closed tengwei12315 closed 3 years ago
i'm not sure what you mean. Our README gives quite a few examples. https://github.com/dmlc/dgl/blob/master/examples/mxnet/sampling/README.md
Thank you for your answer, I want to run my own dataset using mxnet's sapmling model. What should I do?
There is another question. How does Nodeflow and sampling in the tutorial write a broken question? Is there a problem with this model?
There's no problem with the model. The broken image is from doc building stages, and we will try to fix this soon.
sorry, what is broken? the example code in the tutorial should work as long as you can load your data into DGLGraph. Do you find troubles of loading datasets?
I think @tengwei12315 is refering to the broken tutorial image at https://docs.dgl.ai/tutorials/models/index.html#training-on-giant-graphs.
sorry, what is broken? the example code in the tutorial should work as long as you can load your data into DGLGraph. Do you find troubles of loading datasets?
How can I turn my Dataset into DGLGraph input? Where can I find a tutorial example?
You can convert your data into a NetworkX graph or a scipy sparse matrix and pass it to DGLGraph to construct one.
Hello, if my data set has characteristics, how should I input the characteristics of my data set? If my data set has no characteristics, like Karate Club, how should I set the input of features? Do you have any sample code?Thank you !
Unfortunately, I think we don't have example code for this. If your graph data has no node features, you can use one-hot encoding and get embeddings from an embedding matrix, just like many NLP tasks. Similarly, if your data set has characteristics as node data, you can also use one-hot encoding.
I found this is quite a common question. We should write a tutorial about how to prepare custom dataset. This could be done together with our plan of data format. @VoVAllen
waitting for an instruction of building a custom dataset
waitting for an instruction of building a custom dataset
What kind of scenario are you dealing with? If you simply have a graph with node features and you want to do some node classification with sampling-based training, then you can refer to this example. Basically you only need to have a graph object and a tensor for node features.
@mufeili I am facing something similar is that I have a number of networkx graphs that have node and edge attributes and I want to use these graphs for graph classification. The current examples in the documentation only show how to download and use pre-existing datasets but not how one can start with their own networkx graph list.
@mufeili I am facing something similar is that I have a number of networkx graphs that have node and edge attributes and I want to use these graphs for graph classification. The current examples in the documentation only show how to download and use pre-existing datasets but not how one can start with their own networkx graph list.
You can create a list of DGLGraphs
for your dataset from the networks graphs with
# Assume networkx_graphs is a list of networks graphs.
self.graphs = [dgl.from_networkx(nx_g) for nx_g in networkx_graphs]
For more details on loading node/edge attributes, see from_networkx.
@mufeili if I try to follow this guide to make a graph classifier. i have a list of torch data objects which i feed into the dataloader using dataloader = DataLoader(graphs,batch_size=1024,collate_fn=collate,drop_last=False,shuffle=True)
. Even if the graphs
here are DGLGraphs
or torch data objects, the dataloader
shows num_samples
= 0. Apart from using the dataloader, i don't know how to feed the data. This requires the data object to have feature,label,mask attributes which I am not sure how to assign.
Have you checked the user guide on graph classification? How did you define collate
?
I fixed the error for number of samples being 0 and other basic issues. I made my functions similar to the tutorial.
def collate(samples):
graphs, labels = map(list, zip(*samples))
batched_graph = dgl.batch(graphs)
batched_labels = torch.tensor(labels)
return batched_graph, batched_labels
dataloader = DataLoader(train_dataset,batch_size=1024,collate_fn=collate,drop_last=False,shuffle=True)
And the training loop
for epoch in range(20):
for batched_graph, labels in dataloader:
But this is creating an issue
AttributeError: 'MultiDiGraph' object has no attribute 'is_block'
I fixed the error for number of samples being 0 and other basic issues. I made my functions similar to the tutorial.
def collate(samples): graphs, labels = map(list, zip(*samples)) batched_graph = dgl.batch(graphs) batched_labels = torch.tensor(labels) return batched_graph, batched_labels
dataloader = DataLoader(train_dataset,batch_size=1024,collate_fn=collate,drop_last=False,shuffle=True)
And the training loop
for epoch in range(20): for batched_graph, labels in dataloader:
But this is creating an issue
AttributeError: 'MultiDiGraph' object has no attribute 'is_block'
You need to convert the networkX graphs into DGLGraphs first. MultiDiGraph
is a class for directed multigraphs in NetworkX.
@mufeili Also, my graphs do not have any features associated with them, only the nodes and edges are, so for a graph
Graph(num_nodes=410, num_edges=1500, ndata_schemes={} edata_schemes={})
gives graphs[0].ndata
as {}
. What changes should i make to bypass this
feats = batched_graph.ndata['attr'].float()
logits = model(batched_graph, feats)
Ive tried passing empty/filled tensors and lists but both seem to give some error
@mufeili Also, my graphs do not have any features associated with them, only the nodes and edges are, so for a graph
Graph(num_nodes=410, num_edges=1500, ndata_schemes={} edata_schemes={})
givesgraphs[0].ndata
as{}
. What changes should i make to bypass thisfeats = batched_graph.ndata['attr'].float() logits = model(batched_graph, feats)
Ive tried passing empty/filled tensors and lists but both seem to give some error
node_attrs
in using dgl.from_networkx
@mufeili i was trying to talk about graph attributes not node attributes.
@mufeili i was trying to talk about graph attributes not node attributes.
For graph attributes, you can treat them as additional labels and process them in the same way as graph labels.
@mufeili how can i implement kfold validation on dgl graphs.
from sklearn.model_selection import StratifiedKFold
kfold = StratifiedKFold(n_splits=3,shuffle=True, random_state=1337)
for train, test in kfold.split(data, labels):
train_data = list(zip(data[train], labels[train]))
test_data = list(zip(data[test], labels[test]))
Here data
is the list of DGL graphs and it throws the error ValueError: only one element tensors can be converted to Python scalars
@mufeili how can i implement kfold validation on dgl graphs.
from sklearn.model_selection import StratifiedKFold kfold = StratifiedKFold(n_splits=3,shuffle=True, random_state=1337) for train, test in kfold.split(data, labels): train_data = list(zip(data[train], labels[train])) test_data = list(zip(data[test], labels[test]))
Here
data
is the list of DGL graphs and it throws the errorValueError: only one element tensors can be converted to Python scalars
What is data
and labels
? Can you provide a toy example for that? I guess you need to manually implement k-fold cross validation rather than use StratifiedKFold
from scikit-learn.
data[0] = DGLGraph(num_nodes=57211, num_edges=136670,
ndata_schemes={}
edata_schemes={'norm': Scheme(shape=(), dtype=torch.float32), 'rel_type': Scheme(shape=(17,), dtype=torch.float64)})
labels[0]=torch.tensor([0,1])
data[0] = DGLGraph(num_nodes=57211, num_edges=136670, ndata_schemes={} edata_schemes={'norm': Scheme(shape=(), dtype=torch.float32), 'rel_type': Scheme(shape=(17,), dtype=torch.float64)}) labels[0]=torch.tensor([0,1])
How many graphs do you have? What is the shape of labels
? Is labels
for node classification?
@mufeili There are 551 graphs. Labels
is the list of tensors for graph classification so its length is 551.
Assume we follow the standard practice for developing a custom PyTorch, this needs to be
class Dataset:
def __init__(self):
...
def __getitem__(self, idx):
"""
Returns
--------
DGLGraph
The i-th graph.
labels
The labels for the i-th datapoint.
"""
def __len__(self):
"""
Returns
--------
int
The size for the dataset.
"""
You can then implement k-fold splitting as follows:
import random
class Subset(object):
"""Subset of a dataset at specified indices
Code adapted from PyTorch.
Parameters
----------
dataset
dataset[i] should return the ith datapoint
indices : list
List of datapoint indices to construct the subset
"""
def __init__(self, dataset, indices):
self.dataset = dataset
self.indices = indices
def __getitem__(self, item):
"""Get the datapoint indexed by item
Returns
-------
tuple
datapoint
"""
return self.dataset[self.indices[item]]
def __len__(self):
"""Get subset size
Returns
-------
int
Number of datapoints in the subset
"""
return len(self.indices)
def k_fold_split(dataset, k, shuffle=True):
"""
Parameters
-----------
dataset
An instance for the Dataset class defined above.
k: int
The number of folds.
shuffle: bool
Whether to shuffle the dataset before performing a k-fold split.
Returns
--------
list of length k
Each element is a tuple (train_set, val_set) corresponding to a fold.
"""
assert k >= 2, 'Expect the number of folds to be no smaller than 2, got {:d}'.format(k)
all_folds = []
indices = list(range(len(dataset)))
if shuffle:
random.shuffle(indices)
frac_per_part = 1. / k
data_size = len(dataset)
for i in range(k):
val_start = data_size * i * frac_per_part
val_end = data_size * (i + 1) * frac_per_part
val_indices = indices[val_start: val_end]
val_subset = Subset(dataset, val_indices)
train_indices = indices[:val_start] + indices[val_end:]
train_subset = Subset(dataset, train_indices)
all_folds.append((train_subset, val_subset))
return all_folds
❓ Questions and Help
Hi, In the code example, the sampling in mxnet, can you load your own dataset like sse, if so, what should I do? Any answer will be appreciated.