ROCm / hcc

HCC is an Open Source, Optimizing C++ Compiler for Heterogeneous Compute currently for the ROCm GPU Computing Platform
https://github.com/RadeonOpenCompute/hcc/wiki
Other
433 stars 108 forks source link

Issue with stack-based binary tree traversal code #308

Closed szellmann closed 7 years ago

szellmann commented 7 years ago

Since updating from Ubuntu 14.04 ROCm 1.4 to Ubuntu 16.04 ROCm 1.5 recently, a stack-based binary tree traversal code that worked just fine for me the other day now causes memory access violations.

The code basically traverses a binary tree. For each node, the routine processes one child node, and pushes the other child node on a stack for later processing. This happens in a loop. Now, even if I call this routine with a tree that has no inner nodes (so the offending loop is never even entered), memory access violations occur. I carefully checked that the offending loop is never entered, and I'm sure that the mere presence of the loop in the code causes the memory access violations.

I tried my best to strip away all unnecessary code. I hope the following example is self-contained enough and demonstrates the issue: https://gist.github.com/szellmann/3a9f9fec4ad98a838324b1a5d66701e6

A real-world version of the example, the one that used to work before I updated to ROCm 1.5, can be found here (actually, here the tree has also only one node because the example is really tiny): https://gist.github.com/szellmann/5e0bf867732be8caf9f13092834dcd76 (the offending loop is part of a lib and can be found here:) https://github.com/szellmann/visionaray/blob/master/include/visionaray/detail/bvh/intersect.inl (the tree data structure is defined here:) https://github.com/szellmann/visionaray/blob/master/include/visionaray/bvh.h

I tried a few variations, e.g. making the pointer in struct stack unsigned* or char*. I'm however absolutely not able to reformulate my code so that the compiler generates a valid program. A fix for this issue would be really appreciated because the tree traversal code is at the heart of my ray tracing library Visionaray!

My configuration is:

> cat /etc/issue
Ubuntu 16.04.2 LTS \n \l
> /opt/rocm/hcc/bin/hcc --version
HCC clang version 5.0.0  (based on HCC 1.0.17172-ac6fc20-ae1d3ca-a102334 )
Target: x86_64-unknown-linux-gnu

and alternatively

> ~/build-hcc/bin/hcc --version
HCC clang version 5.0.0  (based on HCC 1.0.17185-b8e9052-8685ae3-32090c8 )
Target: x86_64-unknown-linux-gnu
szellmann commented 7 years ago

I just wanted to kindly ask, since there's no comments or so, if there is something I can do to improve the quality of the bug report.

It's of course clear that there can be no solution/fix on such short notice. However, having a fix for this issue is quite important to me because it basically hinders me from further porting my lib to ROCm and thus to AMD GPUs at all. If there is something I can do to make investigating the issue easier, please let me know. (I already tried pinpointing the exact commit where this breaks for me with git-bisect, this is however rather hard because many revisions have issues compiling/linking/running under Ubuntu 16.04.)

scchan commented 7 years ago

@szellmann Thanks for the bug report. We are a bit short-handed and haven't got a chance to look at this issue yet. I just took a brief look at the reduced example (very helpful BTW) but I didn't see anything obvious. I'll set aside time to take a deeper and get back to you.

lawrenceyan commented 7 years ago

Any updates on this?

szellmann commented 7 years ago

Any updates on this?

@lawrenceyan do you have a similar problem?

lawrenceyan commented 7 years ago

@szellmann Yes, I'm having a similar problem using Ubuntu. Not a huge issue for me as I'm not too focused on Linux support specifically, but I'd definitely be curious to see what ended up being the problem/fix if you do find it.

szellmann commented 7 years ago

Closing this. We just replaced our old Hawaii system with an RX580, PCIe Gen3 and Ryzen system and here the code executes fine. Our old setup will probably no longer be supported by now anyway. Sorry for the noise!