dmlc / dgl

Python package built to ease deep learning on graph, on top of existing DL frameworks.
http://dgl.ai
Apache License 2.0
13.18k stars 2.99k forks source link

Issue with prob_name in graphbolt neighbor sampling #7495

Closed nicksukie closed 5 days ago

nicksukie commented 5 days ago

I'm trying to sample neighbors from a graph using graphbolt and some pre-calculated probabilities.

My probabilities tensor exists as an attribute of the graph the graph. When I print out print(self.graph.edata['sim']), the tensor shows up clearly:

tensor([0.0962, 1.0000, 1.0000,  ..., 1.0000, 1.0000, 1.0000])

However, when I attempt to conduct neighbor sampling, it is not recognizing my probabilities tensor.


        indptr = self.graph.adj_tensors('csc')[0]
        indices = self.graph.adj_tensors('csc')[1]
        fused_graph = gb.fused_csc_sampling_graph(indptr, indices)
        seed_tensor = seed_nodes.unsqueeze(0) if seed_nodes.dim() == 1 else seed_nodes
        item_set = gb.ItemSet(seed_tensor, names="seeds")
        datapipe = gb.ItemSampler(item_set, batch_size=len(seed_nodes))
        datapipe = datapipe.sample_neighbor(fused_graph, fanouts, replace=False, prob_name='sim' if self.sim_aggregate else None)

Error:

Traceback (most recent call last):
  File "/data/user/repo1/main.py", line 51, in <module>
    score_pos, score_neg = model(graph_pos, graph_neg)
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/anaconda3/envs/torch-2.3-python311/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/anaconda3/envs/torch-2.3-python311/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/user/repo1/models/models.py", line 71, in forward
    scores_pos = self.compute_scores(users_pos, items_pos)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/user/repo1/models/models.py", line 84, in compute_scores
    data_flows, node_index, user_indices, item_indices = self.graph_manager.point_sample_neighbors(users, items)
                                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/user/repo1/models/_utils.py", line 438, in point_sample_neighbors
    node_flow = self.sample_neighbors(seed_nodes)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/user/repo1/models/_utils.py", line 487, in sample_neighbors
    sampled_subgraphs = next(iter(datapipe)).sampled_subgraphs
                        ^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/anaconda3/envs/torch-2.3-python311/lib/python3.11/site-packages/torch/utils/data/datapipes/_hook_iterator.py", line 181, in wrap_generator
    response = gen.send(None)
               ^^^^^^^^^^^^^^
  File "/usr/local/anaconda3/envs/torch-2.3-python311/lib/python3.11/site-packages/torch/utils/data/datapipes/iter/callable.py", line 124, in __iter__
    for data in self.datapipe:
  File "/usr/local/anaconda3/envs/torch-2.3-python311/lib/python3.11/site-packages/torch/utils/data/datapipes/_hook_iterator.py", line 181, in wrap_generator
    response = gen.send(None)
               ^^^^^^^^^^^^^^
  File "/usr/local/anaconda3/envs/torch-2.3-python311/lib/python3.11/site-packages/torch/utils/data/datapipes/iter/callable.py", line 124, in __iter__
    for data in self.datapipe:
  File "/usr/local/anaconda3/envs/torch-2.3-python311/lib/python3.11/site-packages/torch/utils/data/datapipes/_hook_iterator.py", line 181, in wrap_generator
    response = gen.send(None)
               ^^^^^^^^^^^^^^
  File "/usr/local/anaconda3/envs/torch-2.3-python311/lib/python3.11/site-packages/torch/utils/data/datapipes/iter/callable.py", line 124, in __iter__
    for data in self.datapipe:
  File "/usr/local/anaconda3/envs/torch-2.3-python311/lib/python3.11/site-packages/torch/utils/data/datapipes/_hook_iterator.py", line 181, in wrap_generator
    response = gen.send(None)
               ^^^^^^^^^^^^^^
  File "/usr/local/anaconda3/envs/torch-2.3-python311/lib/python3.11/site-packages/torch/utils/data/datapipes/iter/callable.py", line 124, in __iter__
    for data in self.datapipe:
  File "/usr/local/anaconda3/envs/torch-2.3-python311/lib/python3.11/site-packages/torch/utils/data/datapipes/_hook_iterator.py", line 181, in wrap_generator
    response = gen.send(None)
               ^^^^^^^^^^^^^^
  File "/usr/local/anaconda3/envs/torch-2.3-python311/lib/python3.11/site-packages/torch/utils/data/datapipes/iter/callable.py", line 125, in __iter__
    yield self._apply_fn(data)
          ^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/anaconda3/envs/torch-2.3-python311/lib/python3.11/site-packages/torch/utils/data/datapipes/iter/callable.py", line 90, in _apply_fn
    return self.fn(data)
           ^^^^^^^^^^^^^
  File "/usr/local/anaconda3/envs/torch-2.3-python311/lib/python3.11/site-packages/dgl/graphbolt/minibatch_transformer.py", line 38, in _transformer
    minibatch = self.transformer(minibatch)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/anaconda3/envs/torch-2.3-python311/lib/python3.11/site-packages/dgl/graphbolt/impl/neighbor_sampler.py", line 175, in _sample_per_layer
    subgraph = self.sampler(
               ^^^^^^^^^^^^^
  File "/usr/local/anaconda3/envs/torch-2.3-python311/lib/python3.11/site-packages/dgl/graphbolt/impl/fused_csc_sampling_graph.py", line 629, in sample_neighbors
    C_sampled_subgraph = self._sample_neighbors(
                         ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/anaconda3/envs/torch-2.3-python311/lib/python3.11/site-packages/dgl/graphbolt/impl/fused_csc_sampling_graph.py", line 733, in _sample_neighbors
    self._check_sampler_arguments(nodes, fanouts, probs_name)
  File "/usr/local/anaconda3/envs/torch-2.3-python311/lib/python3.11/site-packages/dgl/graphbolt/impl/fused_csc_sampling_graph.py", line 665, in _check_sampler_arguments
    probs_name in self.edge_attributes
    probs_name in self.edge_attributes
TypeError: argument of type 'NoneType' is not iterable
This exception is thrown by __iter__ of SamplePerLayer(datapipe=MiniBatchTransformer, fanout=tensor([20]), prob_name='sim', replace=False, sampler=<bound method FusedCSCSamplingGraph.sample_neighbors of FusedCSCSamplingGraph(csc_indptr=tensor([     0,     11,     27,  ..., 220356, 220358, 220360]),
                      indices=tensor([ 6053,  6051,  6061,  ..., 15028,  5988, 15029]),
                      total_num_nodes=15030, num_edges=220360,)>)

Perhaps they have to be converted into the fused_csc_sampling_graph format. But the input for the prob_name is not clearly specified anywhere other than being a string.

It's worth noting that I'm migrating my code from an older version of DGL when neighbor sampling was done via dgl.contrib.sampling.NeighborSampler. Using the old method, it works like a charm, but unfortunately this version is not compatible with my current codebase. I also have not found any explanation of how to migrate code from the contrib to the graphbolt framework for neighbor sampling.

Any insights or assistance is very much appreciated.

Regards.

Environment

mfbalin commented 5 days ago

Change your code to:

indptr = self.graph.adj_tensors('csc')[0]
indices = self.graph.adj_tensors('csc')[1]
fused_graph = gb.fused_csc_sampling_graph(indptr, indices, edge_attributes={'sim': self.graph.edata['sim']})
seed_tensor = seed_nodes.unsqueeze(0) if seed_nodes.dim() == 1 else seed_nodes
item_set = gb.ItemSet(seed_tensor, names="seeds")
datapipe = gb.ItemSampler(item_set, batch_size=len(seed_nodes))
datapipe = datapipe.sample_neighbor(fused_graph, fanouts, replace=False, prob_name='sim' if self.sim_aggregate else None)
mfbalin commented 5 days ago

edge_attributes parameter of the gb.fused_csc_sampling_graph is used to initialize edge attributes of dgl.graphbolt.FusedCSCSamplingGraph.

mfbalin commented 5 days ago

Also, I would recommend trying out sample_layer_neighbor in-place of sample_neighbor and see if it does the job for you. It is a drop-in replacement. More information about it can be found here: https://docs.dgl.ai/en/latest/generated/dgl.graphbolt.LayerNeighborSampler.html#dgl.graphbolt.LayerNeighborSampler

mfbalin commented 5 days ago

If you have CUDA enabled GPU, then I would insert a datapipe = datapipe.copy_to('cuda') right after ItemSampler line, so that the sampling operation can run on your GPU. For that, you need to move fused_graph to either pinned memory or the GPU memory.

nicksukie commented 5 days ago

edge_attributes parameter of the gb.fused_csc_sampling_graph is used to initialize edge attributes of dgl.graphbolt.FusedCSCSamplingGraph.

This did solve my issue. Thank you.

May I ask what is the benefit of using sample_layer_neighbor in-place of sample_neighbor, and how does this affect the output graph format?

Thanks again

mfbalin commented 5 days ago

This did solve my issue. Thank you.

May I ask what is the benefit of using sample_layer_neighbor in-place of sample_neighbor, and how does this affect the output graph format?

Thanks again/

The output graph format is exactly the same. sample_layer_neighbor correlates the sampling procedures of your vertices so that the sampled neighborhoods have more overlap. If you do multilayer sampling, you will see that you will have significantly fewer nodes and edges sampled at the end, which improves training throughput. The model convergence is unaffected by this difference.

More information can be found here: https://neurips.cc/virtual/2023/poster/71999

For optimal performance, you should consider performing the sampling and feature fetch operations on the GPU by placing a copy_to in your sampling pipeline before these operations.

nicksukie commented 5 days ago

Understood. Thanks for sharing. I will look into this.

nicksukie commented 4 days ago

I have a follow-up issue. Not sure if you are able to help me with this one too @mfbalin:

Essentially, I want to know how to use Graphbolt for neighborhood aggregation. In older versions it was (where data_flows are the output of dgl.contrib.sampling.NeighborSampler):

def encode(self, data_flows, training=True):
        # print(data_flows)
        x = self.embeddings
        nf = next(iter(data_flows))
        nf.copy_from_parent()
        nf.layers[0].data['activation'] = x[nf.layers[0].data['feature']]
        for i, layer in enumerate(self.layers):

            h = nf.layers[i].data.pop('activation')
            h = F.dropout(h, p=self.dropout, training=training)
            nf.layers[i].data['h'] = h
            nf.block_compute(i,
                             fn.copy_src(src='h', out='m'),
                             lambda node : {'h': node.mailbox['m'].mean(dim=1)},
                             layer)

        h = nf.layers[-1].data.pop('activation')

        return h

But Graphbolt doesn't allow for many of the same functions. Any guidance would be appreciated.

Btw, I have also posted my question here: https://discuss.dgl.ai/t/neighbor-sampling-and-aggregation-with-graphbolt/4457

mfbalin commented 4 days ago

The .blocks method: https://docs.dgl.ai/en/latest/generated/dgl.graphbolt.MiniBatch.html#dgl.graphbolt.MiniBatch.blocks Returns the DGL data structures that you can use to do model computations.

Example use: https://github.com/dmlc/dgl/blob/489671cd43a08b824b835412b3f365b16e834983/examples/sampling/graphbolt/node_classification.py#L313