@JingChaoLiu @liuxuebo0 Hello, When I always occurs the problem as follow, I don't know the reason? Someone says that learning rate is large, but what learning rate is ok? Could you give me a solution?
Traceback (most recent call last):
File "tools/train_net.py", line 186, in <module>
main()
File "tools/train_net.py", line 179, in main
model = train(cfg, args.local_rank, args.distributed)
File "tools/train_net.py", line 85, in train
arguments,
File "/home/donglin/INSTALL_DIR/PMTD-inference/maskrcnn_benchmark/engine/trainer.py", line 75, in do_train
loss_dict = model(images, targets)
File "/home/donglin/anaconda2/envs/maskrcnn/lib/python3.7/site-packages/torch/nn/modules/module.py", line 491, in __call__
result = self.forward(*input, **kwargs)
File "/home/donglin/anaconda2/envs/maskrcnn/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 367, in forward
return self.module(*inputs[0], **kwargs[0])
File "/home/donglin/anaconda2/envs/maskrcnn/lib/python3.7/site-packages/torch/nn/modules/module.py", line 491, in __call__
result = self.forward(*input, **kwargs)
File "/home/donglin/anaconda2/envs/maskrcnn/lib/python3.7/site-packages/apex-0.1-py3.7-linux-x86_64.egg/apex/amp/_initialize.py", line 204, in new_fwd
**applier(kwargs, input_caster))
File "/home/donglin/INSTALL_DIR/PMTD-inference/maskrcnn_benchmark/modeling/detector/generalized_rcnn.py", line 50, in forward
proposals, proposal_losses = self.rpn(images, features, targets)
File "/home/donglin/anaconda2/envs/maskrcnn/lib/python3.7/site-packages/torch/nn/modules/module.py", line 491, in __call__
result = self.forward(*input, **kwargs)
File "/home/donglin/INSTALL_DIR/PMTD-inference/maskrcnn_benchmark/modeling/rpn/rpn.py", line 207, in forward
return self._forward_train(anchors, objectness, rpn_box_regression, targets)
File "/home/donglin/INSTALL_DIR/PMTD-inference/maskrcnn_benchmark/modeling/rpn/rpn.py", line 223, in _forward_train
anchors, objectness, rpn_box_regression, targets
File "/home/donglin/anaconda2/envs/maskrcnn/lib/python3.7/site-packages/torch/nn/modules/module.py", line 491, in __call__
result = self.forward(*input, **kwargs)
File "/home/donglin/INSTALL_DIR/PMTD-inference/maskrcnn_benchmark/modeling/rpn/inference.py", line 140, in forward
sampled_boxes.append(self.forward_for_single_feature_map(a, o, b))
File "/home/donglin/INSTALL_DIR/PMTD-inference/maskrcnn_benchmark/modeling/rpn/inference.py", line 115, in forward_for_single_feature_map
boxlist = remove_small_boxes(boxlist, self.min_size)
File "/home/donglin/INSTALL_DIR/PMTD-inference/maskrcnn_benchmark/structures/boxlist_ops.py", line 46, in remove_small_boxes
(ws >= min_size) & (hs >= min_size)
RuntimeError: copy_if failed to synchronize: device-side assert triggered
terminate called without an active exception
terminate called without an active exception
terminate called without an active exception
terminate called without an active exception
Traceback (most recent call last):
File "/home/donglin/anaconda2/envs/maskrcnn/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/home/donglin/anaconda2/envs/maskrcnn/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/donglin/anaconda2/envs/maskrcnn/lib/python3.7/site-packages/torch/distributed/launch.py", line 238, in <module>
main()
File "/home/donglin/anaconda2/envs/maskrcnn/lib/python3.7/site-packages/torch/distributed/launch.py", line 234, in main
cmd=process.args)
@JingChaoLiu @liuxuebo0 Hello, When I always occurs the problem as follow, I don't know the reason? Someone says that learning rate is large, but what learning rate is ok? Could you give me a solution?