Closed emushtaq closed 6 years ago
This is your problem: TypeError: unorderable types: list() >= int()
It means that you gave a list with one element to the StandardUpdater
. You have to provide only a single integer, by i.e. writing StandardUpdater(iterator=train_iterators, optimizer=optimizer, device=args.gpus[0])
Thanks!
I made a few more changes to get it working on one GPU. Line https://github.com/Bartzi/see/blob/edcde78993dfde0f79d120252b7edfd440944a9b/chainer/train_svhn.py#L193 and https://github.com/Bartzi/see/blob/edcde78993dfde0f79d120252b7edfd440944a9b/chainer/train_svhn.py#L206 , changing them to updater.device
.
I now have a Cupy NVRTC error.
Exception in main training loop: nvrtc: error: failed to load builtins Traceback (most recent call last): File "/usr/local/lib/python3.5/dist-packages/chainer/training/trainer.py", line 299, in run update() File "/usr/local/lib/python3.5/dist-packages/chainer/training/updater.py", line 223, in update self.update_core() File "/usr/local/lib/python3.5/dist-packages/chainer/training/updater.py", line 234, in update_core optimizer.update(loss_func, in_arrays) File "/usr/local/lib/python3.5/dist-packages/chainer/optimizer.py", line 534, in update loss = lossfun(args, kwds) File "/workdir/workspace/see/chainer/utils/multi_accuracy_classifier.py", line 44, in call self.y = self.predictor(*x) File "/workdir/workspace/see/chainer/models/svhn.py", line 209, in call h = self.localization_net(images) File "/workdir/workspace/see/chainer/models/svhn.py", line 41, in call h = self.bn0(self.conv0(images)) File "/usr/local/lib/python3.5/dist-packages/chainer/links/connection/convolution_2d.py", line 154, in call self._initialize_params(x.shape[1]) File "/usr/local/lib/python3.5/dist-packages/chainer/links/connection/convolution_2d.py", line 141, in _initialize_params self.W.initialize(W_shape) File "/usr/local/lib/python3.5/dist-packages/chainer/variable.py", line 1250, in initialize data = initializers.generate_array(self.initializer, shape, xp) File "/usr/local/lib/python3.5/dist-packages/chainer/initializers/init.py", line 46, in generate_array initializer(array) File "/usr/local/lib/python3.5/dist-packages/chainer/initializers/normal.py", line 68, in call Normal(s)(array) File "/usr/local/lib/python3.5/dist-packages/chainer/initializers/normal.py", line 36, in call array[...] = xp.random.normal(*args) File "/usr/local/lib/python3.5/dist-packages/cupy/random/distributions.py", line 94, in normal cupy.multiply(x, scale, out=x) File "/usr/local/lib/python3.5/dist-packages/cupy/core/fusion.py", line 713, in call return self._cupy_op(args, kwargs) File "cupy/core/elementwise.pxi", line 826, in cupy.core.core.ufunc.call File "cupy/util.pyx", line 39, in cupy.util.memoize.decorator.ret File "cupy/core/elementwise.pxi", line 625, in cupy.core.core._get_ufunc_kernel File "cupy/core/elementwise.pxi", line 33, in cupy.core.core._get_simple_elementwise_kernel File "cupy/core/carray.pxi", line 146, in cupy.core.core.compile_with_cache File "/usr/local/lib/python3.5/dist-packages/cupy/cuda/compiler.py", line 135, in compile_with_cache base = _preprocess('', options, arch) File "/usr/local/lib/python3.5/dist-packages/cupy/cuda/compiler.py", line 98, in _preprocess result = prog.compile(options) File "/usr/local/lib/python3.5/dist-packages/cupy/cuda/compiler.py", line 245, in compile raise CompileException(log, self.src, self.name, options) Will finalize trainer extensions and updater before reraising the exception. Traceback (most recent call last): File "/usr/local/lib/python3.5/dist-packages/cupy/cuda/compiler.py", line 241, in compile nvrtc.compileProgram(self.ptr, options) File "cupy/cuda/nvrtc.pyx", line 98, in cupy.cuda.nvrtc.compileProgram File "cupy/cuda/nvrtc.pyx", line 108, in cupy.cuda.nvrtc.compileProgram File "cupy/cuda/nvrtc.pyx", line 53, in cupy.cuda.nvrtc.check_status cupy.cuda.nvrtc.NVRTCError: NVRTC_ERROR unknown (7)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "chainer/train_svhn.py", line 258, in
Hmm, seems like your CUDA environment is either not correctly installed, or your paths to the CUDA toolkit are not set correctly... but that is just a guess. Its definitely a problem with your development environment.
Ok. I’ll try setting it up again and giving it another go. Closing till then. Thanks for your help.
Tried a completely new setup.
I changed the following to make it run on a single GPU.
updater = StandardUpdater(iterator=train_iterators, optimizer=optimizer, device=args.gpus[0])
It seems to be causing a segmentation fault. Any idea why this may be happening.
was running the training script with the flag -g 0 in the single GPU case. This seems to be the reason for the above error.
After resolving a few environment issues, stumbled into this error. Help appreciated.
CMD: python chainer/train_svhn.py curriculum.json /logs --char-map datasets/svhn/svhn_char_map.json --blank-label 0 -b 10
python chainer/train_svhn.py curriculum.json /logs --char-map datasets/svhn/svhn_char_map.json --blan
Exception in main training loop: list indices must be integers or slices, not str
Traceback (most recent call last):thon chainer/train_svhn.py curriculum.json /logs --char-map datasets/svhn/svhn_char_map.json --bla
File "/usr/local/lib/python3.5/dist-packages/chainer/training/trainer.py", line 296, in run
while not stop_trigger(self):ython chainer/train_svhn.py curriculum.json /logs --char-map datasets/svhn/svhn_char_map.json --blan
File "/usr/local/lib/python3.5/dist-packages/chainer/training/triggers/interval_trigger.py", line 51, in __call__
epoch_detail = updater.epoch_detail
File "/usr/local/lib/python3.5/dist-packages/chainer/training/updater.py", line 159, in epoch_detail
return self._iterators['main'].epoch_detail
Will finalize trainer extensions and updater before reraising the exception.
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/chainer/training/trainer.py", line 313, in run
six.reraise(*sys.exc_info())
File "/usr/local/lib/python3.5/dist-packages/six.py", line 693, in reraise
raise value
File "/usr/local/lib/python3.5/dist-packages/chainer/training/trainer.py", line 296, in run
while not stop_trigger(self):
File "/usr/local/lib/python3.5/dist-packages/chainer/training/triggers/interval_trigger.py", line 51, in __call__
epoch_detail = updater.epoch_detail
File "/usr/local/lib/python3.5/dist-packages/chainer/training/updater.py", line 159, in epoch_detail
return self._iterators['main'].epoch_detail
TypeError: list indices must be integers or slices, not str
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "chainer/train_svhn.py", line 258, in <module>
trainer.run()
File "/usr/local/lib/python3.5/dist-packages/chainer/training/trainer.py", line 319, in run
self.updater.finalize()
File "/usr/local/lib/python3.5/dist-packages/chainer/training/updater.py", line 177, in finalize
for iterator in six.itervalues(self._iterators):
File "/usr/local/lib/python3.5/dist-packages/six.py", line 584, in itervalues
return iter(d.values(**kw))
AttributeError: 'list' object has no attribute 'values'
Yeah I see the problem. You changed the Updater and I did not tell you that you'll also need to change this line to train_iterators = chainer.iterators.MultiprocessIterator(gpu_datasets[0], args.batch_size)
. The StandardUpdater can not handle a list of iterators, but needs just one.
ah. But this leads to a segmentation fault with the trainer.run()
call. Not sure what's happening
interesting, maybe it works better with a Docker Container?
I'll try the docker container and get back.
Tried the docker file to start fresh, still having the same error. This is the gdb backtrace of the segmentation fault.
[New Thread 0x7fff015e5700 (LWP 1807)]
[New Thread 0x7fff00de4700 (LWP 1808)]
[New Thread 0x7fff005e3700 (LWP 1809)]
Thread 1 "python" received signal SIGSEGV, Segmentation fault.
0x00007ffff7de373c in elf_machine_rela (skip_ifunc=0, reloc_addr_arg=0x7fffd590ef40, version=0x48, sym=0x7fffd56034c0, reloc=0x7fffd5618640, map=0xeab900) at ../sysdeps/x86_64/dl-machine.h:301
301 ../sysdeps/x86_64/dl-machine.h: No such file or directory.
:man_shrugging: I don't know... have you tried googling the error?
Hmm. ok. That hadn't really proved fruitful. Just updating with more logs from the thread that had the segmentation fault, this time using fault handler
Current thread 0x00007f9d8c79e700 (most recent call first):
File "/usr/local/lib/python3.5/dist-packages/numpy/lib/arraypad.py", line 142 in _append_const
File "/usr/local/lib/python3.5/dist-packages/numpy/lib/arraypad.py", line 1371 in pad
File "/usr/local/lib/python3.5/dist-packages/chainer/utils/conv.py", line 76 in im2col_cpu
File "/usr/local/lib/python3.5/dist-packages/chainer/functions/pooling/max_pooling_2d.py", line 20 in forward_cpu
File "/usr/local/lib/python3.5/dist-packages/chainer/function_node.py", line 338 in forward
File "/usr/local/lib/python3.5/dist-packages/chainer/function_node.py", line 245 in apply
File "/usr/local/lib/python3.5/dist-packages/chainer/functions/pooling/max_pooling_2d.py", line 303 in max_pooling_2d
File "/workdir/see/chainer/models/svhn.py", line 45 in __call__
File "/workdir/see/chainer/models/svhn.py", line 209 in __call__
File "/workdir/see/chainer/utils/multi_accuracy_classifier.py", line 44 in __call__
File "/usr/local/lib/python3.5/dist-packages/chainer/optimizer.py", line 534 in update
File "/usr/local/lib/python3.5/dist-packages/chainer/training/updater.py", line 234 in update_core
File "/usr/local/lib/python3.5/dist-packages/chainer/training/updater.py", line 223 in update
File "/usr/local/lib/python3.5/dist-packages/chainer/training/trainer.py", line 299 in run
File "chainer/train_svhn.py", line 262 in <module>
Segmentation fault (core dumped)
there is something going wrong in numpy, maybe your installation is faulty? Something on your machine that is causing the troubles?
Yes, it's getting pretty hard to debug these environmental issues. Anyways, I tried to uninstall numpy and installed it again. This is the new error stack
Current thread 0x00007eff1e418700 (most recent call first):
File "/usr/local/lib/python3.5/dist-packages/chainer/functions/normalization/batch_normalization.py", line 178 in forward
File "/usr/local/lib/python3.5/dist-packages/chainer/function.py", line 135 in forward
File "/usr/local/lib/python3.5/dist-packages/chainer/function_node.py", line 245 in apply
File "/usr/local/lib/python3.5/dist-packages/chainer/function.py", line 235 in __call__
File "/usr/local/lib/python3.5/dist-packages/chainer/functions/normalization/batch_normalization.py", line 128 in backward
File "/usr/local/lib/python3.5/dist-packages/chainer/function_node.py", line 514 in backward_accumulate
File "/usr/local/lib/python3.5/dist-packages/chainer/variable.py", line 981 in _backward_main
File "/usr/local/lib/python3.5/dist-packages/chainer/variable.py", line 880 in backward
File "/usr/local/lib/python3.5/dist-packages/chainer/optimizer.py", line 539 in update
File "/usr/local/lib/python3.5/dist-packages/chainer/training/updater.py", line 234 in update_core
File "/usr/local/lib/python3.5/dist-packages/chainer/training/updater.py", line 223 in update
File "/usr/local/lib/python3.5/dist-packages/chainer/training/trainer.py", line 299 in run
File "chainer/train_svhn.py", line 262 in <module>
Segmentation fault (core dumped)
Guessing that it might be chainer, I tried with the newer 4.0.0 version. Was not fruitful, same error.
After a lot of attempts with trying to work in a fresh environment (using the included dockerfile), I have made some progress, it is starting to train. But now, I am getting OOM exceptions even with small batch sizes,
My Command:
python3 chainer/train_svhn.py curriculum.json /logs --char-map datasets/svhn/svhn_char_map.json --blank-label 0 -b 8 -g 5
The Error:
format(optimizer.eps))
epoch iteration main/loss main/accuracy lr fast_validation/main/loss fast_validation/main/accuracy validation/main/loss validation/main/accuracy
Exception in main training loop: cudaErrorMemoryAllocation: out of memory
Traceback (most recent call last):............................] 2.37%
File "/usr/local/lib/python3.5/dist-packages/chainer/training/trainer.py", line 302, in run
entry.extension(self)imated time to finish: 2:40:49.439557.
File "/usr/lib/python3.5/contextlib.py", line 77, in __exit__
self.gen.throw(type, value, traceback)
File "/usr/local/lib/python3.5/dist-packages/chainer/reporter.py", line 98, in scope
yield
File "/usr/local/lib/python3.5/dist-packages/chainer/training/trainer.py", line 302, in run
entry.extension(self)
File "/usr/local/lib/python3.5/dist-packages/chainer/training/extensions/log_report.py", line 83, in __call__
stats_cpu[name] = float(value) # copy to CPU
File "cupy/core/core.pyx", line 1642, in cupy.core.core.ndarray.__float__
File "cupy/core/core.pyx", line 1698, in cupy.core.core.ndarray.get
File "cupy/cuda/memory.pyx", line 329, in cupy.cuda.memory.MemoryPointer.copy_to_host
File "cupy/cuda/runtime.pyx", line 257, in cupy.cuda.runtime.memcpy
File "cupy/cuda/runtime.pyx", line 137, in cupy.cuda.runtime.check_status
Will finalize trainer extensions and updater before reraising the exception.
Traceback (most recent call last):
File "chainer/train_svhn.py", line 257, in <module>
trainer.run()
File "/usr/local/lib/python3.5/dist-packages/chainer/training/trainer.py", line 313, in run
six.reraise(*sys.exc_info())
File "/usr/local/lib/python3.5/dist-packages/six.py", line 693, in reraise
raise value
File "/usr/local/lib/python3.5/dist-packages/chainer/training/trainer.py", line 302, in run
entry.extension(self)
File "/usr/lib/python3.5/contextlib.py", line 77, in __exit__
self.gen.throw(type, value, traceback)
File "/usr/local/lib/python3.5/dist-packages/chainer/reporter.py", line 98, in scope
yield
File "/usr/local/lib/python3.5/dist-packages/chainer/training/trainer.py", line 302, in run
entry.extension(self)
File "/usr/local/lib/python3.5/dist-packages/chainer/training/extensions/log_report.py", line 83, in __call__
stats_cpu[name] = float(value) # copy to CPU
File "cupy/core/core.pyx", line 1642, in cupy.core.core.ndarray.__float__
File "cupy/core/core.pyx", line 1698, in cupy.core.core.ndarray.get
File "cupy/cuda/memory.pyx", line 329, in cupy.cuda.memory.MemoryPointer.copy_to_host
File "cupy/cuda/runtime.pyx", line 257, in cupy.cuda.runtime.memcpy
File "cupy/cuda/runtime.pyx", line 137, in cupy.cuda.runtime.check_status
cupy.cuda.runtime.CUDARuntimeError: cudaErrorMemoryAllocation: out of memory
SMI output for ref:
+-------------------------------+----------------------+----------------------+
| 5 Tesla M40 Off | 00000000:88:00.0 Off | 0 |
| N/A 30C P8 17W / 250W | 0MiB / 11443MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
so far so good, did you make any changes in the code? You should not run out of memory if you are using the original code, the provided svhn data and a batch size like you use.
Not really, I made a fresh clone of the repo and just changed the file paths to load the images properly (my docker specific settings where causing a 'could not load file' error)
This is the only change made:
in file_dataset.py
def load_image(self, file_name):
file_name = os.path.basename(file_name) --> NEW LINE TO CORRECT FILEPATH
with Image.open(os.path.join(self.base_dir, file_name)) as the_image:
Hmm, the only thing I can think of is that some part of the code is keeping a reference to some GPU data... it might help to have a look at the memory usage of the GPU with watch -n 0.5 nvidia-smi
and see whether the network seems to be trained for more than one iteration. If that is the case your problem is related to something like that.
Otherwise I don't know really know what is causing your problem...
You could try to debug it and run each layer of the network in the debugger and examine the memory usage in order to identify the part where you get that problem...
OK, thanks, I will give it a shot. But before that, I will try an alternate GPU. Just to double check.
Finally got a trained network with a diff GPU! 🎉 Could be that I had issues with GPU references like you suggested. Unsure though. Thanks for all your time. Next Step, Evaluation and visualizing the results :)
I hope all goes well!
I also meet the same problem in single GPU, I have modified the scripts(train_svhn.py) mentioned above, but the terminal outputs this: cupy.cuda.driver.CUDADriverError: CUDA_ERROR_UNKNOWN: unknown error do you have any idea?
python ../../chainer/train_svhn.py --char-map ./svhn_char_map.json -b 4 ./crops/curriculum.json ./log/ --blank-label 0 -g 0
Exception in main training loop: CUDA_ERROR_UNKNOWN: unknown error
Traceback (most recent call last):
File "/home/klwang/Software/anaconda2/envs/MXNET3/lib/python3.5/site-packages/chainer/training/trainer.py", line 306, in run
update()
File "/home/klwang/Software/anaconda2/envs/MXNET3/lib/python3.5/site-packages/chainer/training/updaters/standard_updater.py", line 149, in update
self.update_core()
File "/home/klwang/Software/anaconda2/envs/MXNET3/lib/python3.5/site-packages/chainer/training/updaters/standard_updater.py", line 160, in update_core
optimizer.update(loss_func, *in_arrays)
File "/home/klwang/Software/anaconda2/envs/MXNET3/lib/python3.5/site-packages/chainer/optimizer.py", line 640, in update
loss = lossfun(*args, **kwds)
File "/home/klwang/Data2/SEE/see/chainer/utils/multi_accuracy_classifier.py", line 44, in __call__
self.y = self.predictor(*x)
File "/home/klwang/Data2/SEE/see/chainer/models/svhn.py", line 209, in __call__
h = self.localization_net(images)
File "/home/klwang/Data2/SEE/see/chainer/models/svhn.py", line 41, in __call__
h = self.bn0(self.conv0(images))
File "/home/klwang/Software/anaconda2/envs/MXNET3/lib/python3.5/site-packages/chainer/links/connection/convolution_2d.py", line 172, in __call__
self._initialize_params(x.shape[1])
File "/home/klwang/Software/anaconda2/envs/MXNET3/lib/python3.5/site-packages/chainer/links/connection/convolution_2d.py", line 159, in _initialize_params
self.W.initialize(W_shape)
File "/home/klwang/Software/anaconda2/envs/MXNET3/lib/python3.5/site-packages/chainer/variable.py", line 1411, in initialize
data = initializers.generate_array(self.initializer, shape, xp)
File "/home/klwang/Software/anaconda2/envs/MXNET3/lib/python3.5/site-packages/chainer/initializers/__init__.py", line 46, in generate_array
initializer(array)
File "/home/klwang/Software/anaconda2/envs/MXNET3/lib/python3.5/site-packages/chainer/initializers/normal.py", line 68, in __call__
Normal(s)(array)
File "/home/klwang/Software/anaconda2/envs/MXNET3/lib/python3.5/site-packages/chainer/initializers/normal.py", line 36, in __call__
array[...] = xp.random.normal(**args)
File "/home/klwang/Software/anaconda2/envs/MXNET3/lib/python3.5/site-packages/cupy/random/distributions.py", line 94, in normal
cupy.multiply(x, scale, out=x)
File "/home/klwang/Software/anaconda2/envs/MXNET3/lib/python3.5/site-packages/cupy/core/fusion.py", line 717, in __call__
return self._cupy_op(*args, **kwargs)
File "cupy/core/elementwise.pxi", line 839, in cupy.core.core.ufunc.__call__
File "cupy/util.pyx", line 39, in cupy.util.memoize.decorator.ret
File "cupy/core/elementwise.pxi", line 638, in cupy.core.core._get_ufunc_kernel
File "cupy/core/elementwise.pxi", line 33, in cupy.core.core._get_simple_elementwise_kernel
File "cupy/core/carray.pxi", line 146, in cupy.core.core.compile_with_cache
File "/home/klwang/Software/anaconda2/envs/MXNET3/lib/python3.5/site-packages/cupy/cuda/compiler.py", line 166, in compile_with_cache
ls.add_ptr_data(ptx, six.u('cupy.ptx'))
File "cupy/cuda/function.pyx", line 203, in cupy.cuda.function.LinkState.add_ptr_data
File "cupy/cuda/function.pyx", line 205, in cupy.cuda.function.LinkState.add_ptr_data
File "cupy/cuda/driver.pyx", line 119, in cupy.cuda.driver.linkAddData
File "cupy/cuda/driver.pyx", line 75, in cupy.cuda.driver.check_status
Will finalize trainer extensions and updater before reraising the exception.
Traceback (most recent call last):
File "../../chainer/train_svhn.py", line 257, in <module>
trainer.run()
File "/home/klwang/Software/anaconda2/envs/MXNET3/lib/python3.5/site-packages/chainer/training/trainer.py", line 320, in run
six.reraise(*sys.exc_info())
File "/home/klwang/.local/lib/python3.5/site-packages/six.py", line 693, in reraise
raise value
File "/home/klwang/Software/anaconda2/envs/MXNET3/lib/python3.5/site-packages/chainer/training/trainer.py", line 306, in run
update()
File "/home/klwang/Software/anaconda2/envs/MXNET3/lib/python3.5/site-packages/chainer/training/updaters/standard_updater.py", line 149, in update
self.update_core()
File "/home/klwang/Software/anaconda2/envs/MXNET3/lib/python3.5/site-packages/chainer/training/updaters/standard_updater.py", line 160, in update_core
optimizer.update(loss_func, *in_arrays)
File "/home/klwang/Software/anaconda2/envs/MXNET3/lib/python3.5/site-packages/chainer/optimizer.py", line 640, in update
loss = lossfun(*args, **kwds)
File "/home/klwang/Data2/SEE/see/chainer/utils/multi_accuracy_classifier.py", line 44, in __call__
self.y = self.predictor(*x)
File "/home/klwang/Data2/SEE/see/chainer/models/svhn.py", line 209, in __call__
h = self.localization_net(images)
File "/home/klwang/Data2/SEE/see/chainer/models/svhn.py", line 41, in __call__
h = self.bn0(self.conv0(images))
File "/home/klwang/Software/anaconda2/envs/MXNET3/lib/python3.5/site-packages/chainer/links/connection/convolution_2d.py", line 172, in __call__
self._initialize_params(x.shape[1])
File "/home/klwang/Software/anaconda2/envs/MXNET3/lib/python3.5/site-packages/chainer/links/connection/convolution_2d.py", line 159, in _initialize_params
self.W.initialize(W_shape)
File "/home/klwang/Software/anaconda2/envs/MXNET3/lib/python3.5/site-packages/chainer/variable.py", line 1411, in initialize
data = initializers.generate_array(self.initializer, shape, xp)
File "/home/klwang/Software/anaconda2/envs/MXNET3/lib/python3.5/site-packages/chainer/initializers/__init__.py", line 46, in generate_array
initializer(array)
File "/home/klwang/Software/anaconda2/envs/MXNET3/lib/python3.5/site-packages/chainer/initializers/normal.py", line 68, in __call__
Normal(s)(array)
File "/home/klwang/Software/anaconda2/envs/MXNET3/lib/python3.5/site-packages/chainer/initializers/normal.py", line 36, in __call__
array[...] = xp.random.normal(**args)
File "/home/klwang/Software/anaconda2/envs/MXNET3/lib/python3.5/site-packages/cupy/random/distributions.py", line 94, in normal
cupy.multiply(x, scale, out=x)
File "/home/klwang/Software/anaconda2/envs/MXNET3/lib/python3.5/site-packages/cupy/core/fusion.py", line 717, in __call__
return self._cupy_op(*args, **kwargs)
File "cupy/core/elementwise.pxi", line 839, in cupy.core.core.ufunc.__call__
File "cupy/util.pyx", line 39, in cupy.util.memoize.decorator.ret
File "cupy/core/elementwise.pxi", line 638, in cupy.core.core._get_ufunc_kernel
File "cupy/core/elementwise.pxi", line 33, in cupy.core.core._get_simple_elementwise_kernel
File "cupy/core/carray.pxi", line 146, in cupy.core.core.compile_with_cache
File "/home/klwang/Software/anaconda2/envs/MXNET3/lib/python3.5/site-packages/cupy/cuda/compiler.py", line 166, in compile_with_cache
ls.add_ptr_data(ptx, six.u('cupy.ptx'))
File "cupy/cuda/function.pyx", line 203, in cupy.cuda.function.LinkState.add_ptr_data
File "cupy/cuda/function.pyx", line 205, in cupy.cuda.function.LinkState.add_ptr_data
File "cupy/cuda/driver.pyx", line 119, in cupy.cuda.driver.linkAddData
File "cupy/cuda/driver.pyx", line 75, in cupy.cuda.driver.check_status
cupy.cuda.driver.CUDADriverError: CUDA_ERROR_UNKNOWN: unknown error
can you do nvidia-smi
on your machine? Do any of the CUDA examples work?
I have mad the CUDA success on caffe/mxnet and other structures, the CUDA examples also can be done well.
Hmm, good question then. I think it is because of your environment. Did you check that you have the most recent driver and cudnn for this driver installed? You could tryo to reinstall cupy with verbose output and check for anything that seems odd. But other than that I can not tell you what the problem is.
Hello
I am trying to run the training script on the SVHN dataset with the following command:
python chainer/train_svhn.py curriculum.json /logs --char-map datasets/svhn/svhn_char_map.json --blank-label 0 -b 10 -g 0
True
forchainer.cuda.available
andchainer.cuda.cudnn_enabled
I get the following error
/usr/local/lib/python3.5/dist-packages/h5py/init.py:36: FutureWarning: Conversion of the second argument of issubdtype from
updater = StandardUpdater(iterator=train_iterators, optimizer=optimizer, device=args.gpus)
File "/usr/local/lib/python3.5/dist-packages/chainer/training/updater.py", line 144, in init
if device is not None and device >= 0:
TypeError: unorderable types: list() >= int()
Exception ignored in: <bound method MultiprocessIterator.del of <chainer.iterators.multiprocess_iterator.MultiprocessIterator object at 0x7fbddd666c50>>
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/chainer/iterators/multiprocess_iterator.py", line 117, in del
File "/usr/local/lib/python3.5/dist-packages/chainer/iterators/multiprocess_iterator.py", line 242, in terminate
AttributeError: 'NoneType' object has no attribute 'STATUS_TERMINATE'
Exception ignored in: <bound method MultiprocessIterator.del of <chainer.iterators.multiprocess_iterator.MultiprocessIterator object at 0x7fbddd666d68>>
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/chainer/iterators/multiprocess_iterator.py", line 117, in del
File "/usr/local/lib/python3.5/dist-packages/chainer/iterators/multiprocess_iterator.py", line 242, in terminate
AttributeError: 'NoneType' object has no attribute 'STATUS_TERMINATE'
float
tonp.floating
is deprecated. In future, it will be treated asnp.float64 == np.dtype(float).type
. from ._conv import register_converters as _register_converters Traceback (most recent call last): File "chainer/train_svhn.py", line 147, inPlease help resolve. Thanks!