facebookresearch / detectron2

Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.
https://detectron2.readthedocs.io/en/latest/
Apache License 2.0
30.41k stars 7.47k forks source link

AssertionError on training progress #84

Closed kingwpf closed 5 years ago

kingwpf commented 5 years ago

I was trying to train faster_rcnn model using our customer dataset. Here is the error code: ` [10/16 09:24:28 d2.engine.hooks]: Overall training speed: 326 iterations in 0:01:04 (0.1975 s / it) [10/16 09:24:28 d2.engine.hooks]: Total training time: 0:01:04 (0:00:00 on hooks) Traceback (most recent call last): File "projects/FasterRcnnr50/train_net.py", line 181, in args=(args,), File "/home/srwpf/anaconda3/lib/python3.7/site-packages/detectron2/engine/launch.py", line 52, in launch main_func(args) File "projects/FasterRcnnr50/train_net.py", line 169, in main return trainer.train() File "/home/srwpf/anaconda3/lib/python3.7/site-packages/detectron2/engine/defaults.py", line 329, in train super().train(self.start_iter, self.max_iter) File "/home/srwpf/anaconda3/lib/python3.7/site-packages/detectron2/engine/train_loop.py", line 132, in train self.run_step() File "/home/srwpf/anaconda3/lib/python3.7/site-packages/detectron2/engine/train_loop.py", line 212, in run_step loss_dict = self.model(data) File "/home/srwpf/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in call result = self.forward(input, *kwargs) File "/home/srwpf/anaconda3/lib/python3.7/site-packages/detectron2/modeling/meta_arch/rcnn.py", line 82, in forward proposals, proposal_losses = self.proposal_generator(images, features, gt_instances) File "/home/srwpf/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in call result = self.forward(input, **kwargs) File "/home/srwpf/anaconda3/lib/python3.7/site-packages/detectron2/modeling/proposal_generator/rpn.py", line 172, in forward outputs.predict_proposals(), File "/home/srwpf/anaconda3/lib/python3.7/site-packages/detectron2/modeling/proposal_generator/rpn_outputs.py", line 416, in predict_proposals pred_anchor_deltas_i, anchors_i.tensor File "/home/srwpf/anaconda3/lib/python3.7/site-packages/detectron2/modeling/box_regression.py", line 79, in apply_deltas assert torch.isfinite(deltas).all().item() AssertionError Exception ignored in: <function TensorboardXWriter.del at 0x7f13a3406ae8> Traceback (most recent call last): File "/home/srwpf/anaconda3/lib/python3.7/site-packages/detectron2/utils/events.py", line 118, in del File "/home/srwpf/anaconda3/lib/python3.7/site-packages/torch/utils/tensorboard/writer.py", line 964, in close File "/home/srwpf/anaconda3/lib/python3.7/site-packages/torch/utils/tensorboard/writer.py", line 133, in flush File "/home/srwpf/anaconda3/lib/python3.7/site-packages/tensorboard/summary/writer/event_file_writer.py", line 106, in flush File "/home/srwpf/anaconda3/lib/python3.7/site-packages/tensorboard/summary/writer/event_file_writer.py", line 156, in flush File "/home/srwpf/anaconda3/lib/python3.7/site-packages/tensorboard/summary/writer/record_writer.py", line 42, in flush File "/home/srwpf/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/lib/io/file_io.py", line 230, in flush AttributeError: 'NoneType' object has no attribute 'raise_exception_on_not_ok_status'

`

My environment info is: `[10/16 09:23:16 detectron2]: Environment info:


Python 3.7.3 (default, Mar 27 2019, 22:11:17) [GCC 7.3.0] Detectron2 Compiler GCC 5.4 DETECTRON2_ENV_MODULE PyTorch 1.3.0 PyTorch Debug Build False CUDA available True GPU 0 GeForce GTX 1080 Ti Pillow 5.4.1 cv2 4.1.1


PyTorch built with:

`

kingwpf commented 5 years ago

did I meet nan value on training?

ppwwyyxx commented 5 years ago

Yes.

kingwpf commented 5 years ago

@ppwwyyxx I use this dataset to train cascade_rcnn on mmdetection with no error, how to avoid this issue? can I set clip_normal value on detectron2?

ppwwyyxx commented 5 years ago

I do not know, and we in general do not help people find the right settings to train their models.

kingwpf commented 5 years ago

Ok, thank you.