IntelligentSoftwareSystems / Galois

Galois: C++ library for multi-core and multi-node parallelization
http://iss.ices.utexas.edu/?p=projects/galois
Other
310 stars 131 forks source link

Cannot seem to get correct results for Connected Components #408

Open altanh opened 1 year ago

altanh commented 1 year ago

Hi, I've been trying to run the connected components application provided by Galois (both local CPU and distributed CPU) and cannot seem to get correct results unless I use the Serial implementation.

I ran the standard CMake setup using the release-6.0 branch, with a few small hiccups

CMAKE Command: cmake -S . -B _build/ -DCMAKE_BUILD_TYPE=Release -DGALOIS_ENABLE_DIST=1 -DMPI_EXECUTABLE_SUFFIX=".mpich"

Commands I ran

      D-Galois Benchmark Suite v6.0.0 (unknown)
      Copyright (C) 2018 The University of Texas at Austin
      http://iss.ices.utexas.edu/galois/

      application: ConnectedComp - Distributed Heterogeneous with filter.
      ConnectedComp on Distributed Galois.

      [0] Master distribution time : 6e-06 seconds to read 56 bytes in 6 seeks (9.33333 MBPS)
      [0] Starting graph reading.
      [0] Reading graph complete.
      [0] Edge inspection time: 1.1e-05 seconds to read 3392 bytes (308.364 MBPS)
      Loading edge-data while creating edges
      [0] Edge loading time: 2e-06 seconds to read 3392 bytes (1696 MBPS)
      [0] Graph construction complete.
      [0] InitializeGraph::go called
      [0] ConnectedComp::go run 0 called
      Number of components is 100
      [0] ConnectedComp::go run 1 called
      Number of components is 100
      [0] ConnectedComp::go run 2 called
      Number of components is 100
      STAT_TYPE, HOST_ID, REGION, CATEGORY, TOTAL_TYPE, TOTAL
      STAT, 0, dGraph_Generic, CuSPStateRounds, HOST_0, 100
      STAT, 0, Gluon, ReduceNumMessages_ConnectedComp_0, HSUM, 0
      STAT, 0, Gluon, ReduceNumMessages_ConnectedComp_1, HSUM, 0
      STAT, 0, Gluon, ReduceNumMessages_ConnectedComp_2, HSUM, 0
      STAT, 0, ConnectedComp, NumWorkItems_0, HSUM, 100
      STAT, 0, ConnectedComp, NumIterations_0, HMAX, 3
      STAT, 0, ConnectedComp, NumWorkItems_1, HSUM, 100
      STAT, 0, ConnectedComp, NumIterations_1, HMAX, 3
      STAT, 0, ConnectedComp, NumWorkItems_2, HSUM, 100
      STAT, 0, ConnectedComp, NumIterations_2, HMAX, 3
      STAT, 0, Gluon, ReplicationFactor, HOST_0, 1
      PARAM, 0, DistBench, CommandLine, HOST_0, _build/lonestar/analytics/distributed/connected-components/connected-components-push-dist --symmetricGraph ../../inputs/rand_cc_5.gr
      PARAM, 0, DistBench, Threads, HOST_0, 1
      PARAM, 0, DistBench, Hosts, HOST_0, 1
      PARAM, 0, DistBench, Runs, HOST_0, 3
      PARAM, 0, DistBench, Run_UUID, HOST_0, acc9e1df-4dc8-49e1-9b04-59a390ce0c1c
      PARAM, 0, DistBench, Input, HOST_0, ../../inputs/rand_cc_5.gr
      PARAM, 0, DistBench, PartitionScheme, HOST_0, oec
      PARAM, 0, DistBench, Hostname, HOST_0, redpoint
      PARAM, 0, ConnectedComp, Max Iterations, HOST_0, 1000
      PARAM, 0, dGraph, GenericPartitioner, HOST_0, 0

Any help would be greatly appreciated. I have attached the input files (both .mtx and .gr formats). Note that I have locally verified the correctness of the graph format conversion by round-tripping then running a simple isomorphism check. I've tried a few graphs with no success so I doubt the specific input matters.

I am happy to provide any system information or build output as needed, thanks! rand_cc_5.zip