Open pedrovalerolara opened 3 months ago
Unlike Pedro's issue, this is not within the same test. Experiencing the same issue implementing the Basic Hartree-Fock proxy application using JACC. Getting the following error for an input greater than 4 atoms:
GPU Kernel Exception
Stacktrace:
[1] error(s::String)
@ Base ./error.jl:35
[2] throw_if_exception(dev::AMDGPU.HIP.HIPDevice)
@ AMDGPU ~/.julia/packages/AMDGPU/gtxsf/src/exception_handler.jl:123
[3] synchronize(stm::AMDGPU.HIP.HIPStream; blocking::Bool, stop_hostcalls::Bool)
@ AMDGPU ~/.julia/packages/AMDGPU/gtxsf/src/highlevel.jl:53
[4] synchronize (repeats 2 times)
@ ~/.julia/packages/AMDGPU/gtxsf/src/highlevel.jl:49 [inlined]
[5] parallel_for(::Int64, ::typeof(BasicHFProxy._jacc_kernel_threaded_atomix!), ::AMDGPU.ROCArray{Float64, 2, AMDGPU.Runtime.Mem.HIPBuffer}, ::Vararg{Any})
@ JACCAMDGPU ~/.julia/packages/JACC/CPpH7/ext/JACCAMDGPU/JACCAMDGPU.jl:17
[6] bhfp_jacc(inputfile::String; verbose::Bool)
@ BasicHFProxy /autofs/nccsopen-svm1_home/y1e/BasicHFProxy.jl/src/jacc.jl:74
[7] bhfp_jacc
@ /autofs/nccsopen-svm1_home/y1e/BasicHFProxy.jl/src/jacc.jl:18 [inlined]
[8] macro expansion
@ /autofs/nccsopen-svm1_sw/odo/julia-1.10.4/share/julia/stdlib/v1.10/Test/src/Test.jl:669 [inlined]
[9] macro expansion
@ /autofs/nccsopen-svm1_home/y1e/BasicHFProxy.jl/test/runtests_jacc.jl:11 [inlined]
[10] macro expansion
@ /autofs/nccsopen-svm1_sw/odo/julia-1.10.4/share/julia/stdlib/v1.10/Test/src/Test.jl:1577 [inlined]
[11] macro expansion
@ /autofs/nccsopen-svm1_home/y1e/BasicHFProxy.jl/test/runtests_jacc.jl:9 [inlined]
[12] macro expansion
@ /autofs/nccsopen-svm1_sw/odo/julia-1.10.4/share/julia/stdlib/v1.10/Test/src/Test.jl:1577 [inlined]
[13] top-level scope
@ /autofs/nccsopen-svm1_home/y1e/BasicHFProxy.jl/test/runtests_jacc.jl:6
Using AMDGPU v0.8.11
Although JACC.BLAS works well when using a Julia terminal, but it fails when running the AMDGPU JACC.BLAS test (see output below). More work is needed. The JACC.BLAS module is now part of JACC, but the JACC.BLAS test code for the AMDGPU backend is commented.
JACC.BLAS: Error During Test at /home/wfg/github-runners/cousteau-JACC/ci/_work/JACC.jl/JACC.jl/test/tests_amdgpu.jl:100 Got exception outside of a @test GPU Kernel Exception Stacktrace: [1] error(s::String) @ Base ./error.jl:35 [2] throw_if_exception(dev::AMDGPU.HIP.HIPDevice) @ AMDGPU ~/.julia/packages/AMDGPU/BhNdC/src/exception_handler.jl:123 [3] synchronize(stm::AMDGPU.HIP.HIPStream*** blocking::Bool, stop_hostcalls::Bool) @ AMDGPU ~/.julia/packages/AMDGPU/BhNdC/src/highlevel.jl:53 [4] synchronize (repeats 2 times) @ ~/.julia/packages/AMDGPU/BhNdC/src/highlevel.jl:49 [inlined] [5] parallel_for(::Int64, ::typeof(JACC.BLAS._axpy), ::Float64, ::Vararg{Any}) @ JACCAMDGPU ~/github-runners/cousteau-JACC/ci/_work/JACC.jl/JACC.jl/ext/JACCAMDGPU/JACCAMDGPU.jl:12 [6] axpy(n::Int64, alpha::Float64, x::AMDGPU.ROCArray{Float32, 1, AMDGPU.Runtime.Mem.HIPBuffer}, y::AMDGPU.ROCArray{Float32, 1, AMDGPU.Runtime.Mem.HIPBuffer}) @ JACC.BLAS ~/github-runners/cousteau-JACC/ci/_work/JACC.jl/JACC.jl/src/JACCBLAS.jl:14 [7] macro expansion @ ~/github-runners/cousteau-JACC/ci/_work/JACC.jl/JACC.jl/test/tests_amdgpu.jl:125 [inlined] [8] macro expansion @ /auto/software/swtree/ubuntu22.04/x86_64/julia/1.9.1/share/julia/stdlib/v1.9/Test/src/Test.jl:1498 [inlined] [9] top-level scope @ ~/github-runners/cousteau-JACC/ci/_work/JACC.jl/JACC.jl/test/tests_amdgpu.jl:102 [10] include(fname::String) @ Base.MainInclude ./client.jl:478 [11] top-level scope @ ~/github-runners/cousteau-JACC/ci/_work/JACC.jl/JACC.jl/test/runtests.jl:15 [12] include(fname::String) @ Base.MainInclude ./client.jl:478 [13] top-level scope @ none:6 [14] eval @ ./boot.jl:370 [inlined] [15] exec_options(opts::Base.JLOptions) @ Base ./client.jl:280 [16] _start() @ Base ./client.jl:522 Test Summary: | Error Total Time JACC.BLAS | 1 1 1.9s