clMathLibraries / clBLAS

a software library containing BLAS functions written in OpenCL
Apache License 2.0
839 stars 240 forks source link

Test failure under mesa on xTRSM #229

Closed anadon closed 8 years ago

anadon commented 8 years ago

I'm trying to track down this magic, and the problem may very well be in mesa, but in any case it should be documented here. I am also messaging mesa dev's about this. I'd very very appreciative if these could be added ad piglit tests, as doing so is not something I'm familiar enough to do at this time.

https://gist.github.com/anadon/82d7cdffb1275d71708f

https://gist.github.com/anadon/586f54b5f62a22e339af

The core dump is here at 1131MB: http://cs.mtu.edu/~jrmarsha/core.27756 since nothing really would want to host a file that large.

Quick and dirty contact with mesa dev's takes place at irc://irc.freenode.net/dri-devel

anadon commented 8 years ago

Bug entry for mesa: https://bugs.freedesktop.org/show_bug.cgi?id=94273

anadon commented 8 years ago

Part of the issue EdB from irc found was the use of a function called "cl_amd_print" as a functions which appears to be built into the AMD implementation of OpenCL but does not appear in any revision of the specification. This means that all AMD specific function calls need to be removed in order for compilation to work on strictly standards compliant OpenCL compilers.

pavanky commented 8 years ago

I thought ignoring optional extensions was part of the spec. I could be wrong.

That said if removing these fixes the issue, then they should probably be removed. Looking at the files, it feels like they are left overs from debugging sessions.

Have you tested this after commenting out the offending lines?

anadon commented 8 years ago

I have not. I have this, a few school projects, and 3 other research projects which all divide up my time. Yesterday I got to spend alot of time working on this and it'll be a week or more before I can really get at this again.

anadon commented 8 years ago

If I were to, there might also come the issue of another super merge. Of which last time you were not appreciative.

pavanky commented 8 years ago

MY main issue was grouping a single monolithic commit. Otherwise I appreciate the time you are taking to figure out these issues.

Anyway I'll see if I can do something about this when I get time. Can you tell me if you are using the Radeon driver or the AMDGPU driver from mesa.

anadon commented 8 years ago

I am using the radeonSI driver. The hardware I've been using is CGN 1.0 (Pitcairn). In particular, R9 270X 4GB. I'm on 11.3 from the git development head, but behavior is identical on 11.1.2.

anadon commented 8 years ago

Related question: is there a way for me to submit my work for validity checking like it done automatically for pull requests? I have a large patch removing most branching in the CL kernels that I need to run a complete test on.

anadon commented 8 years ago

Tom Stellard is an AMD employee who has done much work of mesa who has bopped in every now and then for my problems. He might be a good person to contact about this if it ends up the problem is OpenCL related or in mesa.

anadon commented 8 years ago

Updated stack trace with better debug information: https://gist.github.com/anadon/4bc558761e1192e26d0d

anadon commented 8 years ago

https://gist.github.com/anadon/c1ea234ade1e9d076970

anadon commented 8 years ago

Appears to be fixed with LLVM 3.9 and mesa 11.3, but new problems await!