Closed Paperone80 closed 6 years ago
You need to keep the installs of pygpu and libgpuarray in sync. This looks like it's using the old 0.6.X headers with a newer Theano.
As for the nccl error, you can probably ignore that one unless you are trying to use multi-gpu.
What's the best way of keeping the headers in sync? I used to compile libgpuarry myself and install theano via pip... Did work for theano 0.9 and pygpu 0.6.1. Didn't work when I used the same approach for pygpu 0.75 (later 0.71) and theano 1.0.0.
Tried deleting older files under /usr/local/lib and in the python directory but not sure where else older files/headers are floating around. Any help is more than appreciated. Thanks.
What you describe should be ok. How are you compiling libgpuarray/pygpu? Can you post the exact commands you ran?
If you want an easy way out, you can use anaconda/miniconda. Then you can install Theano/pygpu/libgpuarray with conda install -c mila-udem theano pygpu
, or just install pygpu with it and Theano via pip.
closing as no news.
I am trying to get Theano 1.0.0 to work with pygpu. Any pygpu v0.7+ results in the following error when using the test script (previous Theano==0.9.0 and pygpu==0.6.1 were working fine):
Using cuDNN version 6021 on context None Preallocating 1218/12189 Mb (0.100000) on cuda0 Mapped name None to device cuda0: TITAN X (Pascal) (0000:41:00.0)
You can find the C code in this temporary file: /tmp/theano_compilation_error_7o_6szvz library inux-x86_64.egg/pygpu/gpuarray_api.h:23:90: is not found.
Exception Traceback (most recent call last)
<ipython-input-2-8efec2400760> in <module>()
...
Exception: ('The following error happened while compiling the node', GpuElemwise{exp,no_inplace}(<GpuArrayType(float32, vector)>), '\n', "Compilation failed (return status=1): In file included from ~/.theano/compiledir_Linux-3.10-el7.x86_64-x86_64-with-redhat-7.4-Maipo-x86_64-3.5.2-64/tmpzokb0gyu/mod.cpp:10:0:. /opt/intel/intelpython35/lib/python3.5/site-packages/pygpu-0.7.1-py3.5-linux-x86_64.egg/pygpu/gpuarray_api.h:23:90: error: 'gpucontext_props' has not been declared. static struct PyGpuContextObject (__pyx_api_f_5pygpu_8gpuarray_pygpu_init)(PyObject , gpucontext_props ) = 0;. ^. ", '[GpuElemwise{exp,no_inplace}(<GpuArrayType(float32, vector)>)]')
pygpu is installed in /opt/intel/intelpython35/lib/python3.5/site-packages/pygpu-0.7.1-py3.5-linux-x86_64.egg/pygpu NumPy version 1.13.3 NumPy relaxed strides checking option: True NumPy is installed in /opt/intel/intelpython35/lib/python3.5/site-packages/numpy Python version 3.5.2 |Intel Corporation| (default, Oct 20 2016, 03:10:33) [GCC 4.8.2 20140120 (Red Hat 4.8.2-15)] nose version 1.3.7
4 errors due 'Could not load "libnccl.so": libnccl.so: cannot open shared object file: No such file or directory'
Any ideas why this error keeps happening and how to resolve? Thanks a lot!