IntelligentSoftwareSystems / Galois

Galois: C++ library for multi-core and multi-node parallelization
http://iss.ices.utexas.edu/?p=projects/galois
Other
310 stars 131 forks source link

BFS hangs #398

Open AKKamath opened 2 years ago

AKKamath commented 2 years ago

I ran LonestarGPU's implementation of BFS on Hollywood input data set (https://suitesparse-collection-website.herokuapp.com/MM/LAW/hollywood-2009.tar.gz). The application seems to hang. I added printfs to the code and it seems to hang at the "bfs_kernel".

Full output is listed below:

./bfs -l -s 570170 ./Galois/build/tools/graph-convert/hollywood-2009.gr 
OPTIONS: coop_conv=False $ outline_iterate_gb=True $ backoff_blocking_factor=4 $ parcomb=True $ np_schedulers=set(['fg', 'tb', 'wp']) $ cc_disable=set([]) $ tb_lb=True $ hacks=set([]) $ np_factor=8 $ instrument=set([]) $ unroll=[] $ read_props=None $ outline_iterate=True $ ignore_nested_errors=False $ np=True $ write_props=None $ quiet_cgen=True $ retry_backoff=True $ cuda.graph_type=basic $ cuda.use_worklist_slots=True $ cuda.worklist_type=basic
nnodes=1139905, nedges=57515616, sizeEdge=4.
Host memory for graph: 447 MB
read 469244200 bytes in 353 ms (1329.30 MB/s)
Apricot0 commented 9 months ago

Hi, I am experiencing the same issue. I created a Docker image of the Lonestar application, and it runs on a cluster with a Quadro RTX 6000 GPU. Even when I run the small sample, the BFS and some other applications hang. Have you resolved the issue?

FROM ubuntu:20.04
ENV BUILD_DIR=/galois/build
ENV SRC_DIR=/galois
WORKDIR /galois
RUN apt-get update \
      && apt-get install -qy \
      apt-transport-https \
      ca-certificates \
      curl \
      wget \
      gnupg \
      software-properties-common \
      && curl -fL https://apt.llvm.org/llvm-snapshot.gpg.key | apt-key add - \
      && apt-add-repository -y 'deb http://apt.llvm.org/focal/ llvm-toolchain-focal main' \
      && apt-get update
RUN apt-get install -qy \
      ccache \
      clang++-10 \
      clang-10 \
      clang-format-10 \
      clang-tidy-10 \
      # clang++ \
      # clang \
      # clang-format \
      # clang-tidy \
      cmake \
      # g++-8 \
      # gcc-8 \
      g++-7 \
      gcc-7 \
      git \
      gosu \
      libfmt-dev \
      libopenmpi-dev \
      libboost-all-dev\
      libnuma-dev\
      llvm-10-dev \
      libpapi-dev\
      mpich \
      libeigen3-dev\
      python3-pip \
      python-is-python3 \
      # && update-alternatives --verbose --install /usr/bin/gcc gcc /usr/bin/gcc-8 90 \
      # && update-alternatives --verbose --install /usr/bin/g++ g++ /usr/bin/g++-8 90 \
      && update-alternatives --verbose --install /usr/bin/gcc gcc /usr/bin/gcc-7 90 \
      && update-alternatives --verbose --install /usr/bin/g++ g++ /usr/bin/g++-7 90 \
      && update-alternatives --verbose --install /usr/bin/clang clang /usr/bin/clang-10 90 \
      && update-alternatives --verbose --install /usr/bin/clang++ clang++ /usr/bin/clang++-10 90 \
      && rm -rf /var/lib/apt/lists/*
# RUN apt-get install llvm \ 
#       libpapi-dev \ 
#       mpich \ 
#       libeigen3-dev
RUN pip3 install --upgrade --no-cache-dir pip setuptools \
      && pip3 install --no-cache-dir conan==1.24

RUN wget https://developer.download.nvidia.com/compute/cuda/10.2/Prod/local_installers/cuda_10.2.89_440.33.01_linux.run \
      && chmod +x cuda_10.2.89_440.33.01_linux.run \
      && ./cuda_10.2.89_440.33.01_linux.run --silent --toolkit --override --toolkitpath=/usr/local/cuda \
      && rm -rf cuda_10.2.89_440.33.01_linux.run

# RUN echo "export PS1='\\W$ '" >> /root/.bashrc
ENV HOME=/root
COPY . .
RUN mkdir -p $BUILD_DIR
WORKDIR ${BUILD_DIR}
RUN export CUDACXX=/usr/local/cuda/bin/nvcc \ 
      && cmake $SRC_DIR -DGALOIS_CUDA_CAPABILITY="7.5" -DCMAKE_BUILD_TYPE=Debug
# RUN cmake -S $SRC_DIR -B $BUILD_DIR -DCMAKE_BUILD_TYPE=Release
RUN cd $BUILD_DIR/inputs && mkdir small_inputs
RUN mv $SRC_DIR/small_inputs_for_lonestar_test.tar.gz $BUILD_DIR/inputs/small_inputs
RUN cd $BUILD_DIR/inputs/small_inputs \
      && tar -xzvf small_inputs_for_lonestar_test.tar.gz

WORKDIR $BUILD_DIR
# RUN make -j
# RUN make install

WORKDIR /galois
RUN chmod +x /galois/entry.sh
# ENTRYPOINT ["sh", "-c", "cd /galois/build && make test && /bin/sh"]
ENTRYPOINT ["/galois/entry.sh"]
# CMD ["/bin/sh"]
# ENTRYPOINT ["/entry.sh"]