Neuroglycerin / neukrill-net-work

NDSB competition repository for scripting, note taking and writing submissions.
MIT License
0 stars 0 forks source link

Virtualenv problems #37

Closed gngdb closed 9 years ago

gngdb commented 9 years ago

For some reason, following the standard instructions on our tools repo to install the virtualenv results in a venv that can't run the alexnet_based.json. Only the virtualenv I've set up on my account seems to work, but I've added some extra packages to that in the mean time, so it's not clear what I might've added that fixes it or if there's some quirk to my environment. More details of the errors to follow.

gngdb commented 9 years ago

Trying to recreate the error on my system, installing two new venvs following the instructions, one with --no-site-packages and one with.

Did that, both fresh venvs got the same error:

Exception: ('The following error happened while compiling the node', DownsampleFactorMax{(2, 2),(2, 2),False}(<CudaNdarrayType(float32, 4D)>), '\n', 'nvcc return status', 2, 'for cmd', '/opt/cuda-5.0.35/bin/nvcc -shared -O3 -arch=sm_35 -m64 -Xcompiler -fno-math-errno,-Wno-unused-label,-Wno-unused-variable,-Wno-write-strings,-DCUDA_NDARRAY_CUH=7e6191b286a52f4c1275653fc6d6b81c,-D NPY_NO_DEPRECATED_API=NPY_1_7_API_VERSION,-fPIC -Xlinker -rpath,/afs/inf.ed.ac.uk/user/s08/s0805516/.theano/stonesoup2/compiledir_Linux-2.6-el6.x86_64-x86_64-with-redhat-6.5-Carbon-x86_64-2.7.8-64/cuda_ndarray -I/afs/inf.ed.ac.uk/user/s08/s0805516/.theano/stonesoup2/compiledir_Linux-2.6-el6.x86_64-x86_64-with-redhat-6.5-Carbon-x86_64-2.7.8-64/cuda_ndarray -I/opt/cuda-5.0.35/include -I/afs/inf.ed.ac.uk/user/s08/s0805516/repos/neukrillvenvfresh/lib/python2.7/site-packages/numpy/core/include -I/usr/include/python2.7 -I/afs/inf.ed.ac.uk/user/s08/s0805516/repos/neukrillvenvfresh/lib/python2.7/site-packages/theano/sandbox/cuda -o /afs/inf.ed.ac.uk/user/s08/s0805516/.theano/stonesoup2/compiledir_Linux-2.6-el6.x86_64-x86_64-with-redhat-6.5-Carbon-x86_64-2.7.8-64/tmph5vDH4/418af040422b2c803fec624d00fef917.so mod.cu -L/afs/inf.ed.ac.uk/user/s08/s0805516/.theano/stonesoup2/compiledir_Linux-2.6-el6.x86_64-x86_64-with-redhat-6.5-Carbon-x86_64-2.7.8-64/cuda_ndarray -L/usr/lib -lpython2.7 -lcudart -lcublas -lcuda_ndarray', '[DownsampleFactorMax{(2, 2),(2, 2),False}(<CudaNdarrayType(float32, 4D)>)]')
gngdb commented 9 years ago

Could be a Theano bug, trying updating Theano in the magical venv that works and running the model. Appears to be working in that case. Something else must be different.

gngdb commented 9 years ago

Made a new virtualenv with exactly the same packages, notes on this are here. There must be some other difference between the virtualenvs, but I have no idea what it is now.

gngdb commented 9 years ago

When I updated Theano before I must have forgotten to source the environment variables again. So, when it didn't fail it was because it wasn't running in the GPU. So something in the slightly newer version of Theano breaks it. Now trying to rollback my broken venv to make it work.

gngdb commented 9 years ago

Installed this arbitrary older commit by cloning Theano repository after uninstalling the pip installed version. Now can run alexnet_based on GPUs as before. Will add this to the README.