dmlc / dgl

Python package built to ease deep learning on graph, on top of existing DL frameworks.
http://dgl.ai
Apache License 2.0
13.53k stars 3.02k forks source link

Fused Neigbhor Sampling reproducibility issue #7831

Open taran2210 opened 1 week ago

taran2210 commented 1 week ago

Setting the seed and repeating the fused neighborhood sampling for a source code does not reproduce the same subgraph, have identified a fix that will be slower but allow reproducible subgraphs

To Reproduce

Steps to reproduce the behavior:

  1. Set OMP_NUM_THREADS to >= 2
  2. Define NeighborSampler with fused=True
  3. Set dgl.seed, dgl.random.seed before calling dgl.dataloading.NeighborSampler.sample_blocks
  4. Repeat step 3. and compare blocks[0].srcdata['feat']

The following change allowed for the above to have the same output

diff --git a/src/graph/sampling/neighbor/neighbor.cc b/src/graph/sampling/neighbor/neighbor.cc
index 5393a200..471600dd 100644
--- a/src/graph/sampling/neighbor/neighbor.cc
+++ b/src/graph/sampling/neighbor/neighbor.cc
@@ -14,6 +14,7 @@

 #include <tuple>
 #include <utility>
+#include <execution>

 #include "../../../array/cpu/concurrent_id_hash_map.h"
 #include "../../../c_api_common.h"
@@ -488,6 +489,7 @@ SampleNeighborsFused(
         }
       }
       IdType offset = new_nodes_vec[lhs_node_type].size();
+      auto length = offset;
       new_nodes_vec[lhs_node_type].resize(global_prefix_col.back());
       for (int thread_id = 0; thread_id < num_threads_col; ++thread_id) {
         memcpy(
@@ -496,6 +498,7 @@ SampleNeighborsFused(
             src_nodes_local[thread_id].size() * sizeof(IdType));
         offset += src_nodes_local[thread_id].size();
       }
+    std::sort(std::execution::par_unseq, new_nodes_vec[lhs_node_type].begin() + length, new_nodes_vec[lhs_node_type].end());
     }
   }

This results is the sampling being slower but reproducible, would there be any alternative for multithreaded fused sampling being reproducible?

Environment

frozenbugs commented 1 day ago

Yes, this is a known issue, our implementation of concurrent_id_hash_map can't guarantee deterministic while maintain the high performance. Feel free to file a PR and add a if-else branch to use deterministic solution while user specify the random seed or add other flag to control the behavoir.

https://github.com/dmlc/dgl/blob/master/graphbolt/src/concurrent_id_hash_map.cc