gist-ailab / uoais

Codes of paper "Unseen Object Amodal Instance Segmentation via Hierarchical Occlusion Modeling", ICRA 2022
Other
125 stars 27 forks source link

FloatingPointError: Loss became infinite or NaN at iteration=0! #16

Open niushou opened 1 year ago

niushou commented 1 year ago

The loss_occ_cls of the first iteration is 0

[11/15 02:06:37 adet.trainer]: Starting training from iteration 0 Traceback (most recent call last): File "train_net.py", line 303, in args=(args,), File "/root/anaconda3/envs/uoais/lib/python3.7/site-packages/detectron2/engine/launch.py", line 82, in launch main_func(*args) File "train_net.py", line 286, in main return trainer.train() File "train_net.py", line 83, in train self.train_loop(self.start_iter, self.max_iter) File "train_net.py", line 73, in train_loop self.run_step() File "/root/anaconda3/envs/uoais/lib/python3.7/site-packages/detectron2/engine/defaults.py", line 494, in run_step self._trainer.run_step() File "/root/anaconda3/envs/uoais/lib/python3.7/site-packages/detectron2/engine/train_loop.py", line 287, in run_step self._write_metrics(loss_dict, data_time) File "/root/anaconda3/envs/uoais/lib/python3.7/site-packages/detectron2/engine/train_loop.py", line 302, in _write_metrics SimpleTrainer.write_metrics(loss_dict, data_time, prefix) File "/root/anaconda3/envs/uoais/lib/python3.7/site-packages/detectron2/engine/train_loop.py", line 339, in write_metrics f"Loss became infinite or NaN at iteration={storage.iter}!\n" FloatingPointError: Loss became infinite or NaN at iteration=0! loss_dict = {'loss_cls': 157.09115600585938, 'loss_box_reg': 5.162332534790039, 'loss_visible_mask': 3.1945271492004395, 'loss_amodal_mask': 2.944978952407837, 'loss_occ_cls': nan, 'loss_rpn_cls': 9.696294784545898, 'loss_rpn_loc': 12.890896797180176}

SeungBack commented 1 year ago

Please make sure that cfg.SOLVER.CLIP_GRADIENT.ENABLED in the config file is set to True to prevent gradient explosion.

Lilzhuzixi commented 3 months ago

halo friend! Did you work out this error? Can you give me some advice?

niushou commented 3 months ago

Hello friend, this problem is caused by the deep learning environment. You need to make sure that your device environment is consistent with the code requirements, and replace it with a GPU with more memory.

---Original--- From: @.> Date: Wed, May 22, 2024 21:54 PM To: @.>; Cc: @.**@.>; Subject: Re: [gist-ailab/uoais] FloatingPointError: Loss became infinite orNaN at iteration=0! (Issue #16)

halo friend! Did you work out this error? Can you give me some advice?

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>