fixstars / clpy

OpenCL backend for CuPy
Other
152 stars 13 forks source link

Chainer's ImageNet example (ResNet) #21

Closed LWisteria closed 6 years ago

LWisteria commented 6 years ago

Chainer's ImageNet example (ResNet) doesn't work with current clpy.

LWisteria commented 6 years ago

@t-kitawaki please try this

LWisteria commented 6 years ago

cf. http://proc-cpuinfo.fixstars.com/2017/12/koukaryoku_chainermn/

LWisteria commented 6 years ago

I am preparing training data on the new primary and secondary machines now, wait until completed

ghost commented 6 years ago

Thank you for preparing training data

ghost commented 6 years ago

出力されるエラー(titanv)

Traceback (most recent call last):
  File "/home/kitawaki/.pyenv/versions/test/lib/python3.6/site-packages/chainer-3.3.0-py3.6.egg/chainer/training/trainer.py", line 299, in run
    update()
  File "/home/kitawaki/.pyenv/versions/test/lib/python3.6/site-packages/chainer-3.3.0-py3.6.egg/chainer/training/updater.py", line 223, in update
    self.update_core()
  File "/home/kitawaki/.pyenv/versions/test/lib/python3.6/site-packages/chainer-3.3.0-py3.6.egg/chainer/training/updater.py", line 234, in update_core
    optimizer.update(loss_func, *in_arrays)
  File "/home/kitawaki/.pyenv/versions/test/lib/python3.6/site-packages/chainer-3.3.0-py3.6.egg/chainer/optimizer.py", line 536, in update
    loss = lossfun(*args, **kwds)
  File "/home/kitawaki/test-chainer/chainer/examples/imagenet/nin.py", line 28, in __call__
    h = F.max_pooling_2d(F.relu(self.mlpconv1(x)), 3, stride=2)
  File "/home/kitawaki/.pyenv/versions/test/lib/python3.6/site-packages/chainer-3.3.0-py3.6.egg/chainer/links/connection/mlp_convolution_2d.py", line 99, in __call__
    x = f(l(x))
  File "/home/kitawaki/.pyenv/versions/test/lib/python3.6/site-packages/chainer-3.3.0-py3.6.egg/chainer/links/connection/convolution_2d.py", line 156, in __call__
    x, self.W, self.b, self.stride, self.pad)
  File "/home/kitawaki/.pyenv/versions/test/lib/python3.6/site-packages/chainer-3.3.0-py3.6.egg/chainer/functions/connection/convolution_2d.py", line 467, in convolution_2d
    y, = fnode.apply(args)
  File "/home/kitawaki/.pyenv/versions/test/lib/python3.6/site-packages/chainer-3.3.0-py3.6.egg/chainer/function_node.py", line 245, in apply
    outputs = self.forward(in_data)
  File "/home/kitawaki/.pyenv/versions/test/lib/python3.6/site-packages/chainer-3.3.0-py3.6.egg/chainer/function_node.py", line 337, in forward
    return self.forward_gpu(inputs)
  File "/home/kitawaki/.pyenv/versions/test/lib/python3.6/site-packages/chainer-3.3.0-py3.6.egg/chainer/functions/connection/convolution_2d.py", line 198, in forward_gpu
    cover_all=self.cover_all)
  File "/home/kitawaki/.pyenv/versions/test/lib/python3.6/site-packages/chainer-3.3.0-py3.6.egg/chainer/utils/conv.py", line 121, in im2col_gpu
    h, w, out_h, out_w, kh, kw, sy, sx, ph, pw, dy, dx, col)
  File "clpy/core/elementwise.pxi", line 672, in clpy.core.core.ElementwiseKernel.__call__
  File "clpy/backend/function.pyx", line 149, in clpy.backend.function.Function.linear_launch
  File "clpy/backend/function.pyx", line 112, in clpy.backend.function._launch
  File "clpy/backend/opencl/utility.pyx", line 135, in clpy.backend.opencl.utility.RunNDRangeKernel
  File "clpy/backend/opencl/api.pyx", line 266, in clpy.backend.opencl.api.WaitForEvents
  File "clpy/backend/opencl/exceptions.pyx", line 23, in clpy.backend.opencl.exceptions.check_status
Will finalize trainer extensions and updater before reraising the exception.
Traceback (most recent call last):
  File "./train_imagenet.py", line 164, in <module>
    main()
  File "./train_imagenet.py", line 160, in main
    trainer.run()
  File "/home/kitawaki/.pyenv/versions/test/lib/python3.6/site-packages/chainer-3.3.0-py3.6.egg/chainer/training/trainer.py", line 313, in run
    six.reraise(*sys.exc_info())
  File "/home/kitawaki/.pyenv/versions/test/lib/python3.6/site-packages/six-1.11.0-py3.6.egg/six.py", line 693, in reraise
    raise value
  File "/home/kitawaki/.pyenv/versions/test/lib/python3.6/site-packages/chainer-3.3.0-py3.6.egg/chainer/training/trainer.py", line 299, in run
    update()
  File "/home/kitawaki/.pyenv/versions/test/lib/python3.6/site-packages/chainer-3.3.0-py3.6.egg/chainer/training/updater.py", line 223, in update
    self.update_core()
  File "/home/kitawaki/.pyenv/versions/test/lib/python3.6/site-packages/chainer-3.3.0-py3.6.egg/chainer/training/updater.py", line 234, in update_core
    optimizer.update(loss_func, *in_arrays)
  File "/home/kitawaki/.pyenv/versions/test/lib/python3.6/site-packages/chainer-3.3.0-py3.6.egg/chainer/optimizer.py", line 536, in update
    loss = lossfun(*args, **kwds)
  File "/home/kitawaki/test-chainer/chainer/examples/imagenet/nin.py", line 28, in __call__
    h = F.max_pooling_2d(F.relu(self.mlpconv1(x)), 3, stride=2)
  File "/home/kitawaki/.pyenv/versions/test/lib/python3.6/site-packages/chainer-3.3.0-py3.6.egg/chainer/links/connection/mlp_convolution_2d.py", line 99, in __call__
    x = f(l(x))
  File "/home/kitawaki/.pyenv/versions/test/lib/python3.6/site-packages/chainer-3.3.0-py3.6.egg/chainer/links/connection/convolution_2d.py", line 156, in __call__
    x, self.W, self.b, self.stride, self.pad)
  File "/home/kitawaki/.pyenv/versions/test/lib/python3.6/site-packages/chainer-3.3.0-py3.6.egg/chainer/functions/connection/convolution_2d.py", line 467, in convolution_2d
    y, = fnode.apply(args)
  File "/home/kitawaki/.pyenv/versions/test/lib/python3.6/site-packages/chainer-3.3.0-py3.6.egg/chainer/function_node.py", line 245, in apply
    outputs = self.forward(in_data)
  File "/home/kitawaki/.pyenv/versions/test/lib/python3.6/site-packages/chainer-3.3.0-py3.6.egg/chainer/function_node.py", line 337, in forward
    return self.forward_gpu(inputs)
  File "/home/kitawaki/.pyenv/versions/test/lib/python3.6/site-packages/chainer-3.3.0-py3.6.egg/chainer/functions/connection/convolution_2d.py", line 198, in forward_gpu
    cover_all=self.cover_all)
  File "/home/kitawaki/.pyenv/versions/test/lib/python3.6/site-packages/chainer-3.3.0-py3.6.egg/chainer/utils/conv.py", line 121, in im2col_gpu
    h, w, out_h, out_w, kh, kw, sy, sx, ph, pw, dy, dx, col)
  File "clpy/core/elementwise.pxi", line 672, in clpy.core.core.ElementwiseKernel.__call__
  File "clpy/backend/function.pyx", line 149, in clpy.backend.function.Function.linear_launch
  File "clpy/backend/function.pyx", line 112, in clpy.backend.function._launch
  File "clpy/backend/opencl/utility.pyx", line 135, in clpy.backend.opencl.utility.RunNDRangeKernel
  File "clpy/backend/opencl/api.pyx", line 266, in clpy.backend.opencl.api.WaitForEvents
  File "clpy/backend/opencl/exceptions.pyx", line 23, in clpy.backend.opencl.exceptions.check_status
clpy.backend.opencl.exceptions.OpenCLRuntimeError: UNKNOWN ERROR
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
  File "clpy/backend/opencl/env.pyx", line 75, in clpy.backend.opencl.env.release
  File "clpy/backend/opencl/api.pyx", line 245, in clpy.backend.opencl.api.Flush
  File "clpy/backend/opencl/exceptions.pyx", line 23, in clpy.backend.opencl.exceptions.check_status
clpy.backend.opencl.exceptions.OpenCLRuntimeError: UNKNOWN ERROR
LWisteria commented 6 years ago

https://github.com/fixstars/clpy/wiki/chainer_test_example_results#primary-machine-radeon-1 にある通り、NVIDIA環境はちょっと別の問題があるので、おとなしくAMD環境を使うのが良いです

ghost commented 6 years ago

分かりました。

LWisteria commented 6 years ago

might be same reason as #13 for nvidia gpu

ghost commented 6 years ago

出力されるエラー (vega)

Traceback (most recent call last):
  File "/home/kitawaki/.pyenv/versions/clpy/lib/python3.6/site-packages/chainer-3.3.0-py3.6.egg/chainer/training/trainer.py", line 299, in run
    update()
  File "/home/kitawaki/.pyenv/versions/clpy/lib/python3.6/site-packages/chainer-3.3.0-py3.6.egg/chainer/training/updater.py", line 223, in update
    self.update_core()
  File "/home/kitawaki/.pyenv/versions/clpy/lib/python3.6/site-packages/chainer-3.3.0-py3.6.egg/chainer/training/updater.py", line 234, in update_core
    optimizer.update(loss_func, *in_arrays)
  File "/home/kitawaki/.pyenv/versions/clpy/lib/python3.6/site-packages/chainer-3.3.0-py3.6.egg/chainer/optimizer.py", line 536, in update
    loss = lossfun(*args, **kwds)
  File "/home/kitawaki/opencl-chainer/chainer/examples/imagenet/nin.py", line 28, in __call__
    h = F.max_pooling_2d(F.relu(self.mlpconv1(x)), 3, stride=2)
  File "/home/kitawaki/.pyenv/versions/clpy/lib/python3.6/site-packages/chainer-3.3.0-py3.6.egg/chainer/links/connection/mlp_convolution_2d.py", line 99, in __call__
    x = f(l(x))
  File "/home/kitawaki/.pyenv/versions/clpy/lib/python3.6/site-packages/chainer-3.3.0-py3.6.egg/chainer/links/connection/convolution_2d.py", line 156, in __call__
    x, self.W, self.b, self.stride, self.pad)
  File "/home/kitawaki/.pyenv/versions/clpy/lib/python3.6/site-packages/chainer-3.3.0-py3.6.egg/chainer/functions/connection/convolution_2d.py", line 467, in convolution_2d
    y, = fnode.apply(args)
  File "/home/kitawaki/.pyenv/versions/clpy/lib/python3.6/site-packages/chainer-3.3.0-py3.6.egg/chainer/function_node.py", line 245, in apply
    outputs = self.forward(in_data)
  File "/home/kitawaki/.pyenv/versions/clpy/lib/python3.6/site-packages/chainer-3.3.0-py3.6.egg/chainer/function_node.py", line 337, in forward
    return self.forward_gpu(inputs)
  File "/home/kitawaki/.pyenv/versions/clpy/lib/python3.6/site-packages/chainer-3.3.0-py3.6.egg/chainer/functions/connection/convolution_2d.py", line 199, in forward_gpu
    y = cuda.cupy.tensordot(
Will finalize trainer extensions and updater before reraising the exception.
Traceback (most recent call last):
  File "./train_imagenet.py", line 164, in <module>
    main()
  File "./train_imagenet.py", line 160, in main
    trainer.run()
  File "/home/kitawaki/.pyenv/versions/clpy/lib/python3.6/site-packages/chainer-3.3.0-py3.6.egg/chainer/training/trainer.py", line 313, in run
    six.reraise(*sys.exc_info())
  File "/home/kitawaki/.pyenv/versions/clpy/lib/python3.6/site-packages/six-1.11.0-py3.6.egg/six.py", line 693, in reraise
    raise value
  File "/home/kitawaki/.pyenv/versions/clpy/lib/python3.6/site-packages/chainer-3.3.0-py3.6.egg/chainer/training/trainer.py", line 299, in run
    update()
  File "/home/kitawaki/.pyenv/versions/clpy/lib/python3.6/site-packages/chainer-3.3.0-py3.6.egg/chainer/training/updater.py", line 223, in update
    self.update_core()
  File "/home/kitawaki/.pyenv/versions/clpy/lib/python3.6/site-packages/chainer-3.3.0-py3.6.egg/chainer/training/updater.py", line 234, in update_core
    optimizer.update(loss_func, *in_arrays)
  File "/home/kitawaki/.pyenv/versions/clpy/lib/python3.6/site-packages/chainer-3.3.0-py3.6.egg/chainer/optimizer.py", line 536, in update
    loss = lossfun(*args, **kwds)
  File "/home/kitawaki/opencl-chainer/chainer/examples/imagenet/nin.py", line 28, in __call__
    h = F.max_pooling_2d(F.relu(self.mlpconv1(x)), 3, stride=2)
  File "/home/kitawaki/.pyenv/versions/clpy/lib/python3.6/site-packages/chainer-3.3.0-py3.6.egg/chainer/links/connection/mlp_convolution_2d.py", line 99, in __call__
    x = f(l(x))
  File "/home/kitawaki/.pyenv/versions/clpy/lib/python3.6/site-packages/chainer-3.3.0-py3.6.egg/chainer/links/connection/convolution_2d.py", line 156, in __call__
    x, self.W, self.b, self.stride, self.pad)
  File "/home/kitawaki/.pyenv/versions/clpy/lib/python3.6/site-packages/chainer-3.3.0-py3.6.egg/chainer/functions/connection/convolution_2d.py", line 467, in convolution_2d
    y, = fnode.apply(args)
  File "/home/kitawaki/.pyenv/versions/clpy/lib/python3.6/site-packages/chainer-3.3.0-py3.6.egg/chainer/function_node.py", line 245, in apply
    outputs = self.forward(in_data)
  File "/home/kitawaki/.pyenv/versions/clpy/lib/python3.6/site-packages/chainer-3.3.0-py3.6.egg/chainer/function_node.py", line 337, in forward
    return self.forward_gpu(inputs)
  File "/home/kitawaki/.pyenv/versions/clpy/lib/python3.6/site-packages/chainer-3.3.0-py3.6.egg/chainer/functions/connection/convolution_2d.py", line 199, in forward_gpu
    y = cuda.cupy.tensordot(
AttributeError: module 'cupy' has no attribute 'tensordot'
ghost commented 6 years ago

cuda.cupy 直下で tensordot を見つけられていないだけなので、 cupy/__init__.pyfrom clpy.linalg.product import tensordot # NOQAを追加したら動く。

ghost commented 6 years ago

AttributeError: module 'cupy' has no attribute 'hoge'のタイプのエラー(の一部)は同様の原因だと思われるので、 __init__.pyをcupyから持って来る(要修正)などしてcupyとの互換性を上げるべき?

LWisteria commented 6 years ago

あ、過去の背景を伝え忘れてましたが、元々importしただけで死ぬやつを回避するために、最低限のもの以外がコメントアウトされています。

なので、コメントアウトを解除するだけで動くものがたくさんあります(動かないものもあります

ghost commented 6 years ago

なるほどです

LWisteria commented 6 years ago

Resolved by #23, thank you