Open taran2210 opened 3 weeks ago
Yes, this is a known issue, our implementation of concurrent_id_hash_map can't guarantee deterministic while maintain the high performance. Feel free to file a PR and add a if-else branch to use deterministic solution while user specify the random seed or add other flag to control the behavoir.
https://github.com/dmlc/dgl/blob/master/graphbolt/src/concurrent_id_hash_map.cc
Setting the seed and repeating the fused neighborhood sampling for a source code does not reproduce the same subgraph, have identified a fix that will be slower but allow reproducible subgraphs
To Reproduce
Steps to reproduce the behavior:
The following change allowed for the above to have the same output
This results is the sampling being slower but reproducible, would there be any alternative for multithreaded fused sampling being reproducible?
Environment
conda
,pip
, source): source