ROCm / hipamd

35 stars 37 forks source link

Segmentation fault in hipChildGraphNode::UpdateEventWaitLists #31

Open FreddieWitherden opened 2 years ago

FreddieWitherden commented 2 years ago

Consider the following example (which is written in Python but translates 1:1 to HIP C and should be easy enough to follow) and uses hipGraphAddChildGraphNode to embed a captured graph inside of another graph:

from ctypes import c_int, c_double, c_void_p

import numpy as np

from pyfr.backends.hip.driver import HIP, HIPGraph
from pyfr.backends.hip.compiler import HIPRTC

N = 1024*4096

hip = HIP()
hip.set_device(0)

stream = hip.create_stream()

a_hp = hip.pagelocked_empty((N,), float)
b_hp = hip.pagelocked_empty((N,), float)
c_hp = hip.pagelocked_empty((N,), float)

a_hp[:] = np.random.randn(N)
b_hp[:] = np.random.randn(N)

a_cu = hip.mem_alloc(a_hp.nbytes)
b_cu = hip.mem_alloc(b_hp.nbytes)
c_cu = hip.mem_alloc(c_hp.nbytes)

hiprtc = HIPRTC()

src = '''extern "C" __global__
void add(int n, double x, double *a, double *b, double *c)
{
int i = blockIdx.x*blockDim.x + threadIdx.x;
if (i < n) c[i] = a[i] + x*b[i];
}
'''
mod = hip.load_module(hiprtc.compile('kern', src))

fun = mod.get_function('add', [c_int, c_double] + 3*[c_void_p])
params = fun.make_params((N // 128, 1, 1), (128, 1, 1))
params.set_args(N, 1.1, a_cu, b_cu, c_cu)

g = HIPGraph(hip)
n1 = g.add_memcpy(a_cu, a_hp, a_hp.nbytes)
n2 = g.add_memcpy(b_cu, b_hp, b_hp.nbytes)

# Flip me!
if True:
    stream.begin_capture()
    fun.exec_async(stream, params)
    gg = stream.end_capture()
    n3 = g.add_graph(gg, deps=[n1, n2])
else:
    n3 = g.add_kernel(params, deps=[n1, n2])

g.add_memcpy(c_hp, c_cu, c_hp.nbytes, deps=[n3])

gi = g.instantiate()
gi.launch(stream)
stream.synchronize()

print(np.allclose(c_hp, a_hp + 1.1*b_hp))

Running this under ROCm 5.2 I get a segmentation fault in:

(gdb) bt
#0  0x00007fff4b2e259c in hipChildGraphNode::UpdateEventWaitLists(std::vector<amd::Event*, std::allocator<amd::Event*> >) ()
from /opt/rocm-5.2.0/lib/libamdhip64.so
#1  0x00007fff4b2abbad in FillCommands(std::vector<std::vector<hipGraphNode*, std::allocator<hipGraphNode*> >, std::allocator<std::vector<hipGraphNode*, std::allocator<hipGraphNode*> > > >&, std::unordered_map<hipGraphNode*, std::vector<hipGraphNode*, std::allocator<hipGraphNode*> >, std::hash<hipGraphNode*>, std::equal_to<hipGraphNode*>, std::allocator<std::pair<hipGraphNode* const, std::vector<hipGraphNode*, std::allocator<hipGraphNode*> > > > >&, std::vector<hipGraphNode*, std::allocator<hipGraphNode*> >&, amd::Command*&, amd::Command*&, amd::HostQueue*) () from /opt/rocm-5.2.0/lib/libamdhip64.so
#2  0x00007fff4b2abfdf in hipGraphExec::Run(ihipStream_t*) () from /opt/rocm-5.2.0/lib/libamdhip64.so
#3  0x00007fff4b2ae87d in ihipGraphLaunch(hipGraphExec*, ihipStream_t*) () from /opt/rocm-5.2.0/lib/libamdhip64.so
#4  0x00007fff4b2ba3a7 in hipGraphLaunch () from /opt/rocm-5.2.0/lib/libamdhip64.so

Switching the if block (so we add the kernel directly to the graph) everything works as expected. As does just running the captured graph bare. Given all API commands execute successfully this indicates a bug.

Disassembling:

Dump of assembler code for function _ZN17hipChildGraphNode20UpdateEventWaitListsESt6vectorIPN3amd5EventESaIS3_EE:
0x00007fff4b2e2580 <+0>:     push   %r13
0x00007fff4b2e2582 <+2>:     push   %r12
0x00007fff4b2e2584 <+4>:     mov    %rsi,%r12
0x00007fff4b2e2587 <+7>:     push   %rbp
0x00007fff4b2e2588 <+8>:     push   %rbx
0x00007fff4b2e2589 <+9>:     sub    $0x28,%rsp
0x00007fff4b2e258d <+13>:    mov    0xa8(%rdi),%rax
0x00007fff4b2e2594 <+20>:    mov    0x8(%r12),%rdx
0x00007fff4b2e2599 <+25>:    mov    (%rsi),%rsi
=> 0x00007fff4b2e259c <+28>:    mov    (%rax),%rax

with the associated C++ being:

https://github.com/ROCm-Developer-Tools/hipamd/blob/06f64e1a53ddc9b0ca02993ff5bff95bea1f8f7f/src/hip_graph_internal.hpp#L413

ansurya commented 2 years ago

Internal ticket created to track the issue .