LLNL / Kripke

Kripke is a simple, scalable, 3D Sn deterministic particle transport code
BSD 3-Clause "New" or "Revised" License
36 stars 37 forks source link

Cannot run CUDA-enabled version #27

Open willkill07 opened 3 years ago

willkill07 commented 3 years ago

Issue: execution of kripke.exe results in illegal memory access

Tagged release 1.2.4 does not exhibit this behavior. I did not perform any sort of bisection to find the culprit, but I suspect it's an issue with RAJA somewhere.

Build environment:

host-config file:

set(CMAKE_BUILD_TYPE "Release" CACHE STRING "")

set(CMAKE_CXX_FLAGS "" CACHE STRING "")
set(CMAKE_CXX_FLAGS_RELEASE "-O3 -ffast-math" CACHE STRING "")
set(CMAKE_CXX_FLAGS_RELWITHDEBINFO "-O3 -g -ffast-math" CACHE STRING "")
set(CMAKE_CXX_FLAGS_DEBUG "-O0 -g" CACHE STRING "")

set(ENABLE_CHAI On CACHE BOOL "")
set(ENABLE_CUDA On CACHE BOOL "")
set(CUDA_ARCH "sm_86" CACHE STRING "")
set(ENABLE_OPENMP Off CACHE BOOL "")
set(ENABLE_MPI Off CACHE BOOL "")
set(ENABLE_MPI_WRAPPER Off CACHE BOOL "")

set(CMAKE_CUDA_FLAGS "-restrict -gencode=arch=compute_86,code=sm_86" CACHE STRING "")
set(CMAKE_CUDA_FLAGS_RELEASE "-O3 --expt-extended-lambda" CACHE STRING "")
set(CMAKE_CUDA_FLAGS_RELWITHDEBINFO "-O3 -lineinfo --expt-extended-lambda" CACHE STRING "")
set(CMAKE_CUDA_FLAGS_DEBUG "-O0 -g -G --expt-extended-lambda" CACHE STRING "")
set(CMAKE_CUDA_HOST_COMPILER "${CMAKE_CXX_COMPILER}" CACHE STRING "")

Output:

~/kripke$ ./build/bin/kripke.exe

   _  __       _         _
  | |/ /      (_)       | |
  | ' /  _ __  _  _ __  | | __ ___
  |  <  | '__|| || '_ \ | |/ // _ \
  | . \ | |   | || |_) ||   <|  __/
  |_|\_\|_|   |_|| .__/ |_|\_\\___|
                 | |
                 |_|        Version 1.2.5-dev

LLNL-CODE-775068

Copyright (c) 2014-2019, Lawrence Livermore National Security, LLC

Kripke is released under the BSD 3-Clause License, please see the
LICENSE file for the full license

This work was produced under the auspices of the U.S. Department of
Energy by Lawrence Livermore National Laboratory under Contract
DE-AC52-07NA27344.

Author: Adam J. Kunen <kunen1@llnl.gov>

Compilation Options:
  Architecture:           CUDA
  Compiler:               /usr/bin/c++
  Compiler Flags:         "     -Wall -Wextra  "
  Linker Flags:           " "
  CHAI Enabled:           Yes
  CUDA Enabled:           Yes
    NVCC:                 /usr/local/cuda/bin/nvcc
    NVCC Flags:           "-restrict -gencode=arch=compute_86,code=sm_86 -O3 --expt-extended-lambda"
  MPI Enabled:            No
  OpenMP Enabled:         No
  Caliper Enabled:        No

Input Parameters
================

  Problem Size:
    Zones:                 16 x 16 x 16  (4096 total)
    Groups:                32
    Legendre Order:        4
    Quadrature Set:        Dummy S2 with 96 points

  Physical Properties:
    Total X-Sec:           sigt=[0.100000, 0.000100, 0.100000]
    Scattering X-Sec:      sigs=[0.050000, 0.000050, 0.050000]

  Solver Options:
    Number iterations:     10

  MPI Decomposition Options:
    Total MPI tasks:       1
    Spatial decomp:        1 x 1 x 1 MPI tasks
    Block solve method:    Sweep

  Per-Task Options:
    DirSets/Directions:    8 sets, 12 directions/set
    GroupSet/Groups:       2 sets, 16 groups/set
    Zone Sets:             1 x 1 x 1
    Architecture:          CUDA
    Data Layout:           DGZ

Generating Problem
==================

  Decomposition Space:   Procs:      Subdomains (local/global):
  ---------------------  ----------  --------------------------
  (P) Energy:            1           2 / 2
  (Q) Direction:         1           8 / 8
  (R) Space:             1           1 / 1
  (Rx,Ry,Rz) R in XYZ:   1x1x1       1x1x1 / 1x1x1
  (PQR) TOTAL:           1           16 / 16

  Material Volumes=[8.789062e+03, 1.177734e+05, 2.753438e+06]

  Memory breakdown of Field variables:
  Field Variable            Num Elements    Megabytes
  --------------            ------------    ---------
  data/sigs                        15360        0.117
  dx                                  16        0.000
  dy                                  16        0.000
  dz                                  16        0.000
  ell                               2400        0.018
  ell_plus                          2400        0.018
  i_plane                         786432        6.000
  j_plane                         786432        6.000
  k_plane                         786432        6.000
  mixelem_to_fraction               4352        0.033
  phi                            3276800       25.000
  phi_out                        3276800       25.000
  psi                           12582912       96.000
  quadrature/w                        96        0.001
  quadrature/xcos                     96        0.001
  quadrature/ycos                     96        0.001
  quadrature/zcos                     96        0.001
  rhs                           12582912       96.000
  sigt_zonal                      131072        1.000
  volume                            4096        0.031
  --------                  ------------    ---------
  TOTAL                         34238832      261.222

  Generation Complete!

Steady State Solve
==================

CUDAassert: an illegal memory access was encountered /home/williamk/kripke/tpl/raja/include/RAJA/policy/cuda/MemUtils_CUDA.hpp 183
terminate called after throwing an instance of 'std::runtime_error'
  what():  CUDAassert
Aborted (core dumped)
rchen20 commented 2 years ago

Hey @willkill07, sorry for getting back to you late. I'm not seeing this issue on our LLNL machines, after running it with both gcc/8.3.1 and gcc/8.4.0, and cuda/11.4.1. Could you try it with the latest kripke/develop, and with the following two additional cmake lines?

set(CHAI_ENABLE_RAJA_PLUGIN On CACHE BOOL "") set(ENABLE_RAJA_PLUGIN On CACHE BOOL "")

rchen20 commented 2 years ago

This enhancement should also help (https://github.com/LLNL/Kripke/pull/38), and you'd only need to specify ENABLE_CHAI=On without any of the X_ENABLE_RAJA_PLUGIN variables.