Closed pavanky closed 8 years ago
problem is probably at kernel caching code assuming single context/device scenario.
makeGemmKernel()
only checks if clKernel is built. But the kernel can be built for another context or device.
We tested with gemv and that is working fine. Is there anything special in gemm and are there any other functions that use the same mechanism as gemm?
gemm and trsm since trsm calls gemm.
@TimmyLiu If it is just those two files (xgemm.cc
and xtrsm.cc
), I think I can send in a patch that solves the issue in a couple of days. I am assuming there is no restriction against c++11 in clBLAS ?
@pavanky I think all the major compiler we use support c++ 11. xgemm.cc and xtrsm.cc and GemmSpecialCases.cpp call makeGemmKernel(). They are all using some static kernels, either generated by python or checked in into the repo.
On Windows, python 2.7 needs Visual Studio 2008. python 3.4 Visual Studio 2010. Both of these support c++0x, but not c++11. strong preference to use c++0x as the standard.
I was able to use python 2.7 with VS 2015. @hughperkins are you saying vs 2010 would not work with python 2.7?
Ok, so, if all you want to do is use python as your generator, in clBLAS itself, then compiler options are irrelevant.
However, my DeepCL project:
Python 2.7 was built with Visual Studio 2008. Python 3.4 was built with Visual Studio 2010. Any project that wishes to be imported into python, and used from python, must be buildable using the exact same compiler.
Edit: example of a python script that loads DeepCL, using import
: https://github.com/hughperkins/DeepCL/blob/master/python/test_deepcl.py#L5
@hughperkins clBLAS only uses python at build time, not at runtime. It's a tool which generates cl kernels and c++ host code for us, which are built into the clblas library (excluding the python).
On a separate topic (which I don't think you asked about, but I make note of just in case), python wrappers can be built for clBLAS as an interface to python. A proof of concept for gemm was created using cython. I don't remember anymore what requirements cython imposes for client python scripts, but I'm sure we would have the same minimum requirements as pyopencl.
@kknox I do not want to derail the topic too much, but I've found ctypes
to be fairly straight forward when wrapping a C interface. You just load the clBLAS
library and call the appropriate symbol / function. There is also cffi
module which might provide a bit more safety than ctypes
Going this way reduces the required dependencies (i.e. no reason to use cython).
DeepCL can be used from Python, and is imported from python, therefore needs to be built using the exact same compiler as python, on Windows.
Since DeepCL links with clBLAS, this means that deepcl needs to build clblas with the exact same compiler as python, on Windows.
On Windows, python 2.7 is built with visual studio 2008, and python 3.4 is built with visual studio 2010. Therefore, deepcl, and therefore by extension clblas, should be buildable on windows using both visual studio 2008 and visual studio 2010.
Visual studio 2010 supports C++0x, but does NOT support c++11.
Strong preference that clblas can continue to be compiled using c++0x.
@pavanky Do you know how well does cffi could integrate with pyopencl? I experimented with cython because I had seen other projects use cython and pyopencl together as a proof-point. Any kind of python wrapper for clBLAS needs to be able to accept pyopencl commandqueues, contexts and return pyopencl events.
Since DeepCL links with clBLAS, this means that deepcl needs to build clblas with the exact same compiler as python, on Windows.
@hughperkins I assume that the reason you care which compiler clBLAS is compiled with is because you care about the msvc runtime dependency and you link clblas statically. Are you restricted from linking clblas dynamically? You should be able to mix msvc runtimes then.
I'm linking dynamically. Mixing msvc libraries between dlls is a dangerous practice, which will lead to weird and unexplained behavior, when memory allocated on the heap is shared between the caller and the callee, and when structs created in caller or callee are passed to the other. You can see the following articles for more details:
If the library is very simple, not needing to pass arrays or structs between the caller <-> callee, then you can sometimes get away with mixing runtimes. Such very simple usage is not the case with clBLAS, which involves passing many structs, and involves a significant amount of allocating and freeing.
I have fixed the issue in our fork without using C++11. I think we can stop discussing the topic now.
I have fixed the issue in our fork without using C++11.
Thank you Pavan
I am closing this issue because it is resolved in develop
.
This is a very important issue to have fixed :-)
Just to confirm, this is fixed in autogemm right? Not just for some of the non-generated kernels? (edited to say 'non-generated' rather than 'statically defined')
yes, this is for the autogemm kernels.
Cool :-) Nice! :-)
The code to reproduce the problem is here: