dmlc / dgl

Python package built to ease deep learning on graph, on top of existing DL frameworks.
http://dgl.ai
Apache License 2.0
13.36k stars 3k forks source link

[DataLoader] User-defined Dataloader Problem #7696

Closed jalencato closed 1 week ago

jalencato commented 1 month ago

🐛 Bug

GraphStorm is trying to use own-defined sampler to create one dataloader. The code example should look like:

        sampler = MultiLayerNeighborSamplerForReconstruct(sampler,
                    dataset, construct_feat_ntype, construct_feat_fanout)
        loader = dgl.dataloading.DistNodeDataLoader(g, target_idx, sampler,
            batch_size=batch_size, shuffle=train_task)

The error log is like:

[rank3]: Traceback (most recent call last):
[rank3]:   File "/graphstorm/python/graphstorm/run/gsgnn_np/gsgnn_np.py", line 209, in <module>
[rank3]:     main(gs_args)
[rank3]:   File "/graphstorm/python/graphstorm/run/gsgnn_np/gsgnn_np.py", line 118, in main
[rank3]:     dataloader = GSgnnNodeDataLoader(train_data, train_idxs,
[rank3]:   File "/graphstorm/python/graphstorm/dataloading/dataloading.py", line 1645, in __init__
[rank3]:     self.dataloader = self._prepare_dataloader(dataset,
[rank3]:   File "/graphstorm/python/graphstorm/dataloading/dataloading.py", line 1673, in _prepare_dataloader
[rank3]:     loader = dgl.dataloading.DistNodeDataLoader(g, target_idx, sampler,
[rank3]:   File "/opt/gs-venv/lib/python3.10/site-packages/dgl/dataloading/dist_dataloader.py", line 641, in __init__
[rank3]:     self.collator = NodeCollator(g, nids, graph_sampler, **collator_kwargs)
[rank3]:   File "/opt/gs-venv/lib/python3.10/site-packages/dgl/dataloading/dist_dataloader.py", line 206, in __init__
[rank3]:     Collator.add_edge_attribute_to_graph(self.g, self.graph_sampler.prob)
[rank3]: AttributeError: 'MultiLayerNeighborSamplerForReconstruct' object has no attribute 'prob'

Expected behavior

The whole process works good with dgl 1.1.3, but failed for dgl >= 2.3.0. (Perhaps 2.0+ version). The line which causes problem is here: https://github.com/dmlc/dgl/blob/d650422402aa770fbf7ec05e7230405c45a3bdfa/python/dgl/dataloading/dist_dataloader.py#L205.

Environment

Additional context

rudongyu commented 1 month ago

Thanks for reporting the bug. As a temporary workaround, you may add an empty attribute to the customized sampler as self.prob=None.