Closed danirisdiandita closed 3 years ago
I am getting this error too: CachedOp requires all inputs to live on the same context. But data0 is on gpu(3) while maskrcnn0_normalizedperclassboxcenterencoder0_means is on gpu(2) in FasterRCNN training and MaskRCNN training. Something is wrong with parallelization in last few updates probably
I got the same error:
Exception in thread Thread-7:
Traceback (most recent call last):
File "/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner
self.run()
File "/usr/lib/python3.8/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/home/user/mxnet1.8/xray_gluon_mxnet1.9.0/gluoncv/utils/parallel.py", line 105, in _worker
out = parallel.forward_backward(x)
File "/home/user/mxnet1.8/xray_gluon_mxnet1.9.0/gluoncv/model_zoo/rcnn/mask_rcnn/data_parallel.py", line 48, in forward_backward
cls_targets, box_targets, box_masks, indices = self.net(data, gt_box, gt_label)
File "/home/user/mxnet1.8/xray_gluon_mxnet1.9.0/venv/lib/python3.8/site-packages/mxnet/gluon/block.py", line 825, in __call__
out = self.forward(*args)
File "/home/user/mxnet1.8/xray_gluon_mxnet1.9.0/venv/lib/python3.8/site-packages/mxnet/gluon/block.py", line 1482, in forward
return self._call_cached_op(x, *args)
File "/home/user/mxnet1.8/xray_gluon_mxnet1.9.0/venv/lib/python3.8/site-packages/mxnet/gluon/block.py", line 1225, in _call_cached_op
out = self._cached_op(*cargs)
File "/home/user/mxnet1.8/xray_gluon_mxnet1.9.0/venv/lib/python3.8/site-packages/mxnet/_ctypes/ndarray.py", line 148, in __call__
check_call(_LIB.MXInvokeCachedOpEx(
File "/home/user/mxnet1.8/xray_gluon_mxnet1.9.0/venv/lib/python3.8/site-packages/mxnet/base.py", line 246, in check_call
raise get_last_ffi_error()
mxnet.base.MXNetError: Traceback (most recent call last):
File "../src/imperative/cached_op.cc", line 777
MXNetError: Check failed: inputs[i]->ctx() == default_ctx (gpu(0) vs. gpu(1)) : CachedOp requires all inputs to live on the same context. But data0 is on gpu(1) while maskrcnn0_normalizedperclassboxcenterencoder0_means is on gpu(0)
mxnet build from source from v1.x
(1.9.0)
@zhreshold
I got the same error too with the latest script and mxnet 1.9 nightly wheel
btw WARNING:root:Batch size cannot be evenly split. Trying to shard 8 items into 3 shards
might be resolved by setting batch-size
@zhreshold @szha please, can you take a look on this issue?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
@zhreshold @szha By the way - I can still confirm this issue.
@zhreshold would you like to reopen this issue?
I got error with 1 gpu (out of memory)
shown below:
I got error with 2 gpus by running
and get error by using 3 gpus by running
shown below