Open MS-MA opened 5 years ago
did you do this Please reinstall chainer after you install NCCL.
?
CuPy (cupy) version 2.2.0 may not be compatible with this version of Chainer. Please consider installing the supported version by running: $ pip install 'cupy==6.0.0b3' See the following page for more details: https://docs-cupy.chainer.org/en/latest/install.html
so I executed the two commands "pip uninstall cupy==2.2.0" and reinstalled "cupy-cuda80==6.0.0b3"
nextly I executed this command" python train_fsns.py /home/data/fsns /image/curriculum.json /home/code/mayongjuan/see-master/fsns-model --char-map ../datasets/fsns/fsns_char_map.json --blank-label 0 -b 20 -g 0 3” , but the following problem is raising.
/home/code/mayongjuan/anaconda3/envs/SEE/lib/python3.6/site-packages/chainer/training/updaters/multiprocess_parallel_updater.py:151: UserWarning: optimizer.eps is changed to 2e-08 by MultiprocessParallelUpdater for new batch size. format(optimizer.eps)) Segmentation fault (core dumped)
could you tell me why ?
Are you sure you are using CUDA 8.0 on your machine?
yeah i am sure.
Well, then I don't know... I did not ever use cupy in Version 6, yet... So this might be an issue. Did you try to use the docker container?
No, I haven't used the docker container before, maybe I can try it. very thanks@Bartzi
there is a problem of fsns train_fsns.py: first:The NCCL already is installed in my new environment by following steps
(SEE) mayongjuan@visionGroup:/home/code/mayongjuan/see/chainer$ python train_fsns.py /home/data/fsns/image/curriculum.json /home/code/mayongjuan/see/fsns-model --blank-label 0 --char-map ../datasets/fsns/fsns_char_map.json -b 50 Traceback (most recent call last): File "train_fsns.py", line 169, in
updater = MultiprocessParallelUpdater(train_iterators, optimizer, devices=args.gpus)
File "/home/code/mayongjuan/anaconda3/envs/SEE/lib/python3.6/site-packages/chainer/training/updaters/multiprocess_parallel_updater.py", line 116, in init
'NCCL is not enabled. MultiprocessParallelUpdater '
Exception: NCCL is not enabled. MultiprocessParallelUpdater requires NCCL.
Please reinstall chainer after you install NCCL.
(see https://github.com/chainer/chainer#installation).
Exception ignored in: <bound method MultiprocessIterator.del of <chainer.iterators.multiprocess_iterator.MultiprocessIterator object at 0x7f431e1987f0>>
Traceback (most recent call last):
File "/home/code/mayongjuan/anaconda3/envs/SEE/lib/python3.6/site-packages/chainer/iterators/multiprocess_iterator.py", line 117, in del
File "/home/code/mayongjuan/anaconda3/envs/SEE/lib/python3.6/site-packages/chainer/iterators/multiprocess_iterator.py", line 244, in terminate
File "/home/code/mayongjuan/anaconda3/envs/SEE/lib/python3.6/threading.py", line 347, in notify
TypeError: 'NoneType' object is not callable
Exception ignored in: <bound method MultiprocessIterator.del of <chainer.iterators.multiprocess_iterator.MultiprocessIterator object at 0x7f431e1987b8>>
Traceback (most recent call last):
File "/home/code/mayongjuan/anaconda3/envs/SEE/lib/python3.6/site-packages/chainer/iterators/multiprocess_iterator.py", line 117, in del
File "/home/code/mayongjuan/anaconda3/envs/SEE/lib/python3.6/site-packages/chainer/iterators/multiprocess_iterator.py", line 244, in terminate
File "/home/code/mayongjuan/anaconda3/envs/SEE/lib/python3.6/threading.py", line 347, in notify
TypeError: 'NoneType' object is not callable
I can't figure out why the problem is exiting,i 'm looking forward to your answer. very thanks