NervanaSystems / neon

Intel® Nervana™ reference deep learning framework committed to best performance on all hardware
http://neon.nervanasys.com/docs/latest
Apache License 2.0
3.87k stars 811 forks source link

Run time error with pycuda tool #443

Closed c54852533 closed 6 years ago

c54852533 commented 6 years ago

When i test the neon and type the next instruction __python cifar10_conv.py Here comes the problem


Traceback (most recent call last):
  File "/home/hzhang/anaconda3/envs/neon/lib/python3.6/site-packages/pycuda/tools.py", line 426, in context_dependent_memoize
    return ctx_dict[cur_ctx][args]
KeyError: <pycuda._driver.Context object at 0x7fda89f47df0>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "cifar10_conv.py", line 84, in <module>
    cost=cost, callbacks=callbacks)
  File "/home/hzhang/anaconda3/envs/neon/lib/python3.6/site-packages/nervananeon-2.6.0-py3.6.egg/neon/models/model.py", line 184, in fit
    self._epoch_fit(dataset, callbacks)
  File "/home/hzhang/anaconda3/envs/neon/lib/python3.6/site-packages/nervananeon-2.6.0-py3.6.egg/neon/models/model.py", line 206, in _epoch_fit
    x = self.fprop(x)
  File "/home/hzhang/anaconda3/envs/neon/lib/python3.6/site-packages/nervananeon-2.6.0-py3.6.egg/neon/models/model.py", line 237, in fprop
    res = self.layers.fprop(x, inference)
  File "/home/hzhang/anaconda3/envs/neon/lib/python3.6/site-packages/nervananeon-2.6.0-py3.6.egg/neon/layers/container.py", line 396, in fprop
    x = l.fprop(x, inference=inference)
  File "/home/hzhang/anaconda3/envs/neon/lib/python3.6/site-packages/nervananeon-2.6.0-py3.6.egg/neon/layers/layer.py", line 887, in fprop
    bsum=self.batch_sum, layer_op=self)
  File "/home/hzhang/anaconda3/envs/neon/lib/python3.6/site-packages/nervananeon-2.6.0-py3.6.egg/neon/backends/nervanagpu.py", line 1991, in fprop_conv
    return self._execute_conv("fprop", layer, layer.fprop_kernels, repeat)
  File "/home/hzhang/anaconda3/envs/neon/lib/python3.6/site-packages/nervananeon-2.6.0-py3.6.egg/neon/backends/nervanagpu.py", line 2073, in _execute_conv
    kernels.execute(repeat)
  File "/home/hzhang/anaconda3/envs/neon/lib/python3.6/site-packages/nervananeon-2.6.0-py3.6.egg/neon/backends/convolution.py", line 553, in execute
    kernel = kernel_specs.get_kernel(self.kernel_name, self.kernel_options)
  File "<decorator-gen-35>", line 2, in get_kernel
  File "/home/hzhang/anaconda3/envs/neon/lib/python3.6/site-packages/pycuda/tools.py", line 430, in context_dependent_memoize
    result = func(*args)
  File "/home/hzhang/anaconda3/envs/neon/lib/python3.6/site-packages/nervananeon-2.6.0-py3.6.egg/neon/backends/kernel_specs.py", line 844, in get_kernel
    run_command([ "ptxas -v -arch", arch, "-o", cubin_file, ptx_file ])
  File "/home/hzhang/anaconda3/envs/neon/lib/python3.6/site-packages/nervananeon-2.6.0-py3.6.egg/neon/backends/kernel_specs.py", line 787, in run_command
    raise RuntimeError("Error(%d):\n%s\n%s" % (proc.returncode, cmd, err))
RuntimeError: Error(136):
ptxas -v -arch sm_61 -o /home/hzhang/.cache/neon/kernels/cubin/sconv_direct_fprop_32x128_bsum.cubin /home/hzhang/.cache/neon/kernels/ptx/sconv_direct_fprop_32x128_bsum.ptx
b'Floating point exception (core dumped)\n'__

My neon is installed by anaconda and cuda version is 9.0.176_384.81 Who knows what's going on with this situation?

baojun-nervana commented 6 years ago

@c54852533 It seems there is issue with cuda v9. We are using v8.

$ nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2016 NVIDIA Corporation Built on Tue_Jan_10_13:22:03_CST_2017 Cuda compilation tools, release 8.0, V8.0.61

c54852533 commented 6 years ago

@baojun-nervana Problems solved well! Thanks!