Closed gngdb closed 9 years ago
Trying to recreate the error on my system, installing two new venvs following the instructions, one with --no-site-packages and one with.
Did that, both fresh venvs got the same error:
Exception: ('The following error happened while compiling the node', DownsampleFactorMax{(2, 2),(2, 2),False}(<CudaNdarrayType(float32, 4D)>), '\n', 'nvcc return status', 2, 'for cmd', '/opt/cuda-5.0.35/bin/nvcc -shared -O3 -arch=sm_35 -m64 -Xcompiler -fno-math-errno,-Wno-unused-label,-Wno-unused-variable,-Wno-write-strings,-DCUDA_NDARRAY_CUH=7e6191b286a52f4c1275653fc6d6b81c,-D NPY_NO_DEPRECATED_API=NPY_1_7_API_VERSION,-fPIC -Xlinker -rpath,/afs/inf.ed.ac.uk/user/s08/s0805516/.theano/stonesoup2/compiledir_Linux-2.6-el6.x86_64-x86_64-with-redhat-6.5-Carbon-x86_64-2.7.8-64/cuda_ndarray -I/afs/inf.ed.ac.uk/user/s08/s0805516/.theano/stonesoup2/compiledir_Linux-2.6-el6.x86_64-x86_64-with-redhat-6.5-Carbon-x86_64-2.7.8-64/cuda_ndarray -I/opt/cuda-5.0.35/include -I/afs/inf.ed.ac.uk/user/s08/s0805516/repos/neukrillvenvfresh/lib/python2.7/site-packages/numpy/core/include -I/usr/include/python2.7 -I/afs/inf.ed.ac.uk/user/s08/s0805516/repos/neukrillvenvfresh/lib/python2.7/site-packages/theano/sandbox/cuda -o /afs/inf.ed.ac.uk/user/s08/s0805516/.theano/stonesoup2/compiledir_Linux-2.6-el6.x86_64-x86_64-with-redhat-6.5-Carbon-x86_64-2.7.8-64/tmph5vDH4/418af040422b2c803fec624d00fef917.so mod.cu -L/afs/inf.ed.ac.uk/user/s08/s0805516/.theano/stonesoup2/compiledir_Linux-2.6-el6.x86_64-x86_64-with-redhat-6.5-Carbon-x86_64-2.7.8-64/cuda_ndarray -L/usr/lib -lpython2.7 -lcudart -lcublas -lcuda_ndarray', '[DownsampleFactorMax{(2, 2),(2, 2),False}(<CudaNdarrayType(float32, 4D)>)]')
Could be a Theano bug, trying updating Theano in the magical venv that works and running the model. Appears to be working in that case. Something else must be different.
Made a new virtualenv with exactly the same packages, notes on this are here. There must be some other difference between the virtualenvs, but I have no idea what it is now.
When I updated Theano before I must have forgotten to source the environment variables again. So, when it didn't fail it was because it wasn't running in the GPU. So something in the slightly newer version of Theano breaks it. Now trying to rollback my broken venv to make it work.
Installed this arbitrary older commit by cloning Theano repository after uninstalling the pip installed version. Now can run alexnet_based
on GPUs as before. Will add this to the README.
For some reason, following the standard instructions on our tools repo to install the virtualenv results in a venv that can't run the
alexnet_based.json
. Only the virtualenv I've set up on my account seems to work, but I've added some extra packages to that in the mean time, so it's not clear what I might've added that fixes it or if there's some quirk to my environment. More details of the errors to follow.