Closed kingwpf closed 5 years ago
did I meet nan value on training?
Yes.
@ppwwyyxx I use this dataset to train cascade_rcnn on mmdetection with no error, how to avoid this issue? can I set clip_normal value on detectron2?
I do not know, and we in general do not help people find the right settings to train their models.
Ok, thank you.
I was trying to train faster_rcnn model using our customer dataset. Here is the error code: ` [10/16 09:24:28 d2.engine.hooks]: Overall training speed: 326 iterations in 0:01:04 (0.1975 s / it) [10/16 09:24:28 d2.engine.hooks]: Total training time: 0:01:04 (0:00:00 on hooks) Traceback (most recent call last): File "projects/FasterRcnnr50/train_net.py", line 181, in
args=(args,),
File "/home/srwpf/anaconda3/lib/python3.7/site-packages/detectron2/engine/launch.py", line 52, in launch
main_func(args)
File "projects/FasterRcnnr50/train_net.py", line 169, in main
return trainer.train()
File "/home/srwpf/anaconda3/lib/python3.7/site-packages/detectron2/engine/defaults.py", line 329, in train
super().train(self.start_iter, self.max_iter)
File "/home/srwpf/anaconda3/lib/python3.7/site-packages/detectron2/engine/train_loop.py", line 132, in train
self.run_step()
File "/home/srwpf/anaconda3/lib/python3.7/site-packages/detectron2/engine/train_loop.py", line 212, in run_step
loss_dict = self.model(data)
File "/home/srwpf/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in call
result = self.forward(input, *kwargs)
File "/home/srwpf/anaconda3/lib/python3.7/site-packages/detectron2/modeling/meta_arch/rcnn.py", line 82, in forward
proposals, proposal_losses = self.proposal_generator(images, features, gt_instances)
File "/home/srwpf/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in call
result = self.forward(input, **kwargs)
File "/home/srwpf/anaconda3/lib/python3.7/site-packages/detectron2/modeling/proposal_generator/rpn.py", line 172, in forward
outputs.predict_proposals(),
File "/home/srwpf/anaconda3/lib/python3.7/site-packages/detectron2/modeling/proposal_generator/rpn_outputs.py", line 416, in predict_proposals
pred_anchor_deltas_i, anchors_i.tensor
File "/home/srwpf/anaconda3/lib/python3.7/site-packages/detectron2/modeling/box_regression.py", line 79, in apply_deltas
assert torch.isfinite(deltas).all().item()
AssertionError
Exception ignored in: <function TensorboardXWriter.del at 0x7f13a3406ae8>
Traceback (most recent call last):
File "/home/srwpf/anaconda3/lib/python3.7/site-packages/detectron2/utils/events.py", line 118, in del
File "/home/srwpf/anaconda3/lib/python3.7/site-packages/torch/utils/tensorboard/writer.py", line 964, in close
File "/home/srwpf/anaconda3/lib/python3.7/site-packages/torch/utils/tensorboard/writer.py", line 133, in flush
File "/home/srwpf/anaconda3/lib/python3.7/site-packages/tensorboard/summary/writer/event_file_writer.py", line 106, in flush
File "/home/srwpf/anaconda3/lib/python3.7/site-packages/tensorboard/summary/writer/event_file_writer.py", line 156, in flush
File "/home/srwpf/anaconda3/lib/python3.7/site-packages/tensorboard/summary/writer/record_writer.py", line 42, in flush
File "/home/srwpf/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/lib/io/file_io.py", line 230, in flush
AttributeError: 'NoneType' object has no attribute 'raise_exception_on_not_ok_status'
`
My environment info is: `[10/16 09:23:16 detectron2]: Environment info:
Python 3.7.3 (default, Mar 27 2019, 22:11:17) [GCC 7.3.0] Detectron2 Compiler GCC 5.4 DETECTRON2_ENV_MODULE
PyTorch 1.3.0
PyTorch Debug Build False
CUDA available True
GPU 0 GeForce GTX 1080 Ti
Pillow 5.4.1
cv2 4.1.1
PyTorch built with:
`