Closed myurkin closed 10 years ago
Hi @myurkin A binary release of clBLAS is available on the releases page: https://github.com/clMathLibraries/clBLAS/releases
Given that you are mixing mingw with visual studio, and given your questions in #43, I suspect that this is an artifact calling between mingw and msvc binaries. Can you retest with the latest binary given above? Are you capable of recompiling clBLAS with mingw and using 1 compiler for everything?
Kent
@myurkin Just to confirm, i compiled everything with msvc 2012 and your example program above worked.
Going to close this issue soon and assume it is resolved.
Sorry for delay. The issue is not resolved. I will provide more information within a few days.
I've tried the latest clBLAS - no changes. I haven't tried recompiling clBLAS since I am lazy to install all dependencies. However, I studied the issue somewhat deeper. The functions that are affected (two mentioned above) accept cl_double2
argument, while those unaffected (two mentioned above) - do not have such arguments. Then I found out that the problem is related to the "__cl_double2 v2" part of the union cl_double2
in cl_platform.h
. (I am using version 1.1 of the header, but I checked that the relevant part are not changed up to version 2.0). If I comment out this part of the union, zaxpy.c
compiles and works fine. The same hack fixed my original (larger) code.
Then I've tried to understand the problem further and looked at definition of __cl_double2
in the same header. There are several branches, but none of them helps. It is probably related to the fact that I am anyway using emmintrin.h
from gcc. So all definitions boil down to something like
double __attribute__ ((__vector_size__ (16)))
or
double __attribute__ ((__vector_size__ (16), __may_alias__))
There are two things I do not understand at the moment:
1) How cl_double2 is represented, when you compile the dll with visual studio? Is __cl_double2
used at all (i.e. is __SSE2__
defined)? If yes, can you say anything about the definition of __m128d
(e.g. from some system header)? Is it compatible with the above definitions?
2) Why adding (or removing) a member of the union changes anything at all? It seems that both size and alignment of the union doesn't change. I suspect that mentioning a "vector" variable in the union somehow changes the way it is passed to the function, but I really lack knowledge on this subject. Do you know, if there is an easy way to see how a function is called?
Searching the Internet I stumbled upon the following seemingly relevant discussion http://stackoverflow.com/q/21743144 but I do not got any specific answers from it. In particular, it says that Microsoft compiler is aligning stack at only 4 bytes in 32bit mode, but it doesn't answer the question 2 above.
Concerning workarounds. Hacking system headers is definitely not a good idea. So two other options are:
1) compile with -mno-sse
. The simplest solution, but will most probably affect the performance of the remaining code.
2) include cl.h
, then redefine cl_double2
to a corrected type (without __cl_double2
), then include clAmdBlas.h
(or clBLAS.h
). Should be fine if cl_double2 is not used in some large arrays or something like that (and its v2 part is never addressed).
Hi @myurkin
Sorry for they delay; when I look at cl_double2
in vs2013, nothing is defined in the union except for cl_double CL_ALIGNED(16) s[2];
However, on MSVC there is a comment in the header wrt CL_ALIGNED
:
Alignment keys neutered on windows because MSVC can't swallow function arguments with alignment requirements
Also, I don't see the SSE2 defined in MSVC, so cl_double2
does not have __attribute__((vector_size(16)))
defined.
So, I imagine that the clBLAS library I built with MSVC does not expect the cl_double2
parameters to be aligned, or that it is a packed data type. The application that you are compiling with mingw could be aligning the cl_double2
parameter in the call site, or pass the entire cl_double2
datatype through XMM register. MSVC would probably pass arrays as a pointer, I imagine. With the way that the header is conditionally compiled with pre-processor defines, mixing compilers would break the ABI between the app and the library.
I think the solution would be to compile everything with the same compiler, so that the ABI between the components is consistent.
Unfortunately, compiling everything with the same compiler is not a convenient solution, as it somewhat contradicts the whole notion of dll. In particular, I want to be responsible only for the Windows executable of my code. While the user can get clBlas.dll
from any source (and potentially a newer version then was available when I was compiling my code).
Concerning __SSE2__
- can it be enabled for release version of the dll? I guess it won't harm. At least, we can test if this solves the issue. However, this may in turn cause problems with the programs that do not use sse2 - again, we can test this.
I do understand that this is an inconvenient issue, but I believe a consequence of the preprocessor defines in cl.h. If the #defines are different with different compilers, the interface risks being different.
You might want to check out this stackoverflow issue wrt SSE2 defines in visual studio.
I don't believe that this is an issue with clBLAS anymore, and will be closing it soon unless you can convince me that there is an action item for us.
I agree, that little can be done on clBLAS side. You may, however, want to mention this issue in some readme supplied with 32-bit Windows dll. Then the code using clBLAS should take care of the compatible definition of the cl_double2.
I am trying to use clAmdBlas 1.10 package on Windows 7 (64 bit) notebook with Nvidia GTX540M. I am using MinGW (with gcc 4.8.1) to compile the programs both in native (64-bit) and in 32-bit mode. I managed to link my program against clAmdBlas and run it in 64-bit, but had issues in 32-bit mode. After investigating it boiled down to double_complex functions.
To test it I took the example example_saxpy.c from the package and modified into all other three modes (D,C, and Z). Then I compiled and tested all 4 (with -m32). Compilation was fine for all 4. All ran fine except Zaxpy. The latter produced segmentation fault when calling clAmdBlasZaxpy. I attach corresponding test in the end.
I have also tested the same executables (without recompiling) on another Windows7 computer with AMD7950 GPU with the same conclusions.
As additional information, in my code I called several complex functions: Zaxpy and Zscal failed, while Zdotu and Dznrm2 worked fine. However, I haven't tested them in details as I did for Zaxpy.
I report a problem for rather old version of the package, because I do not see at easy way to obtain a more recent DLL for Windows.