NervanaSystems / neon

Intel® Nervana™ reference deep learning framework committed to best performance on all hardware
http://neon.nervanasys.com/docs/latest
Apache License 2.0
3.87k stars 811 forks source link

Core dumped when running mnist_nlp example #408

Closed riccitensor closed 6 years ago

riccitensor commented 6 years ago

I cannot run the MNIST NLP example on Ubuntu 16.4. I run it like this:

(.venv2) gamer@gamer:~/neon$ python examples/mnist_mlp.py 
RuntimeError: module compiled against API version 0xb but this version of numpy is 0xa
terminate called after throwing an instance of 'std::runtime_error'
  what():  numpy failed to initialize
Aborted (core dumped)

My nvcc version is 8

(.venv2) gamer@gamer:~/neon$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2016 NVIDIA Corporation
Built on Tue_Jan_10_13:22:03_CST_2017
Cuda compilation tools, release 8.0, V8.0.61

When running check_gpu, I have

(.venv2) gamer@gamer:~/neon$ nvcc neon/backends/util/check_gpu.c && ./a.out; echo $?
nvcc warning : The 'compute_20', 'sm_20', and 'sm_21' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
255
baojun-nervana commented 6 years ago

Can you upgrade to Neon2.2 or 2.3? It sounds a numpy version conflict.

wei-v-wang commented 6 years ago

Hi,

Your issues might be different but from this thread https://github.com/NervanaSystems/neon/issues/398#issuecomment-340120280

the following was tried to fix the numpy 0xb vs. 0xa error:

" I just rebuilt neon and tried

pip install --upgrade numpy --no-cache-dir to fix that error "

riccitensor commented 6 years ago

I installed the latest version of Neon. Also, I did upgrade numpy, but this time the error is "backend must be one of ('cpu'). First thing I see that that no CUDA-capable device is detected (I have GeForce 1070gfx, which seems to be supported https://en.wikipedia.org/wiki/CUDA#GPUs_supported)

baojun-nervana commented 6 years ago

which platform are you running? Can you specify the backend as "-b mkl" or "-b cpu"?

riccitensor commented 6 years ago

Ubuntu 16.4. "-b mkl" -> invalid choice: 'mkl' (choose from 'cpu'), "-b cpu" seems to work (10 epoch-training and the misclassification error 2.6%). Looks like, for some reason, GPU is really not supported, even though this is GFX 1070 (included in the CUDA-capable device list)

baojun-nervana commented 6 years ago

I just tested the example and it supports three backends - gpu, mkl and cpu. I am using the latest version v2.3.0 (we just released the newest version last Friday).

You may try to download the new version and build again, and pay attention if there is any error during the build.

riccitensor commented 6 years ago

You are right. Re-installed and now works smoothly with all backends.