Open niushou opened 2 years ago
Please make sure that cfg.SOLVER.CLIP_GRADIENT.ENABLED
in the config file is set to True to prevent gradient explosion.
halo friend! Did you work out this error? Can you give me some advice?
Hello friend, this problem is caused by the deep learning environment. You need to make sure that your device environment is consistent with the code requirements, and replace it with a GPU with more memory.
---Original--- From: @.> Date: Wed, May 22, 2024 21:54 PM To: @.>; Cc: @.**@.>; Subject: Re: [gist-ailab/uoais] FloatingPointError: Loss became infinite orNaN at iteration=0! (Issue #16)
halo friend! Did you work out this error? Can you give me some advice?
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>
The loss_occ_cls of the first iteration is 0
[11/15 02:06:37 adet.trainer]: Starting training from iteration 0 Traceback (most recent call last): File "train_net.py", line 303, in
args=(args,),
File "/root/anaconda3/envs/uoais/lib/python3.7/site-packages/detectron2/engine/launch.py", line 82, in launch
main_func(*args)
File "train_net.py", line 286, in main
return trainer.train()
File "train_net.py", line 83, in train
self.train_loop(self.start_iter, self.max_iter)
File "train_net.py", line 73, in train_loop
self.run_step()
File "/root/anaconda3/envs/uoais/lib/python3.7/site-packages/detectron2/engine/defaults.py", line 494, in run_step
self._trainer.run_step()
File "/root/anaconda3/envs/uoais/lib/python3.7/site-packages/detectron2/engine/train_loop.py", line 287, in run_step
self._write_metrics(loss_dict, data_time)
File "/root/anaconda3/envs/uoais/lib/python3.7/site-packages/detectron2/engine/train_loop.py", line 302, in _write_metrics
SimpleTrainer.write_metrics(loss_dict, data_time, prefix)
File "/root/anaconda3/envs/uoais/lib/python3.7/site-packages/detectron2/engine/train_loop.py", line 339, in write_metrics
f"Loss became infinite or NaN at iteration={storage.iter}!\n"
FloatingPointError: Loss became infinite or NaN at iteration=0!
loss_dict = {'loss_cls': 157.09115600585938, 'loss_box_reg': 5.162332534790039, 'loss_visible_mask': 3.1945271492004395, 'loss_amodal_mask': 2.944978952407837, 'loss_occ_cls': nan, 'loss_rpn_cls': 9.696294784545898, 'loss_rpn_loc': 12.890896797180176}