WindVChen / DRENet

The official implementation of DRENet (Degraded Reconstruction Enhancement Network) for tiny ship detection in remote sensing Images
GNU General Public License v3.0
43 stars 6 forks source link

performance decrease during training #12

Open ramdhan1989 opened 1 year ago

ramdhan1989 commented 1 year ago

Hi, do you have suggestion to overcome this problem during training ?

Epoch   gpu_mem       box       obj       cls       dgi     total   targets  img_size
     0/199       11G    0.1279   0.01601         0  0.008378     2.849         6       512: 100%|█| 1800/1800 [14:48<00
               Class      Images     Targets           P           R      mAP@.5  mAP@.5:.95: 100%|█| 344/344 [01:07<00
30.39782691001892
                 all    2.75e+03    4.51e+03           0           0    5.13e-06    9.81e-07

     Epoch   gpu_mem       box       obj       cls       dgi     total   targets  img_size
     1/199       11G    0.1261   0.01524         0  0.005636     2.846         6       512: 100%|█| 1800/1800 [14:03<00
               Class      Images     Targets           P           R      mAP@.5  mAP@.5:.95: 100%|█| 344/344 [01:01<00
31.465840816497803
                 all    2.75e+03    4.51e+03           0           0    3.55e-06    6.64e-07

     Epoch   gpu_mem       box       obj       cls       dgi     total   targets  img_size
     2/199       11G    0.1214   0.01546         0  0.005382     2.844        14       512: 100%|█| 1800/1800 [13:46<00
               Class      Images     Targets           P           R      mAP@.5  mAP@.5:.95: 100%|█| 344/344 [01:03<00
32.228920221328735
                 all    2.75e+03    4.51e+03       0.321       0.297       0.194      0.0497

     Epoch   gpu_mem       box       obj       cls       dgi     total   targets  img_size
     3/199       11G    0.1142   0.01436         0  0.005227     2.839        20       512: 100%|█| 1800/1800 [13:39<00
               Class      Images     Targets           P           R      mAP@.5  mAP@.5:.95: 100%|█| 344/344 [00:58<00
28.6451997756958
                 all    2.75e+03    4.51e+03       0.316       0.485       0.345      0.0999

     Epoch   gpu_mem       box       obj       cls       dgi     total   targets  img_size
     4/199       11G   0.09978   0.01415         0  0.005147     2.832         7       512: 100%|█| 1800/1800 [13:23<00
               Class      Images     Targets           P           R      mAP@.5  mAP@.5:.95: 100%|█| 344/344 [00:57<00
28.444270849227905
                 all    2.75e+03    4.51e+03       0.408       0.578       0.472       0.167

     Epoch   gpu_mem       box       obj       cls       dgi     total   targets  img_size
     5/199       11G   0.09265   0.01457         0  0.005125     2.829         5       512: 100%|█| 1800/1800 [13:32<00
               Class      Images     Targets           P           R      mAP@.5  mAP@.5:.95: 100%|█| 344/344 [01:02<00
30.84639859199524
                 all    2.75e+03    4.51e+03       0.399       0.623       0.507       0.161

     Epoch   gpu_mem       box       obj       cls       dgi     total   targets  img_size
     6/199       11G   0.08306   0.01727         0  0.005281     2.825        10       512: 100%|█| 1800/1800 [13:44<00
               Class      Images     Targets           P           R      mAP@.5  mAP@.5:.95: 100%|█| 344/344 [01:01<00
30.013824462890625
                 all    2.75e+03    4.51e+03       0.285       0.589       0.453       0.145

     Epoch   gpu_mem       box       obj       cls       dgi     total   targets  img_size
     7/199       11G       nan       nan         0  0.005711       nan         6       512: 100%|█| 1800/1800 [13:36<00
               Class      Images     Targets           P           R      mAP@.5  mAP@.5:.95: 100%|█| 344/344 [00:51<00
31.282738208770752
                 all    2.75e+03    4.51e+03           0           0    1.57e-06    1.74e-07

     Epoch   gpu_mem       box       obj       cls       dgi     total   targets  img_size
     8/199       11G       nan       nan         0       nan       nan        10       512: 100%|█| 1800/1800 [13:31<00
               Class      Images     Targets           P           R      mAP@.5  mAP@.5:.95: 100%|█| 344/344 [00:49<00
32.83151125907898
                 all    2.75e+03           0           0           0           0           0

     Epoch   gpu_mem       box       obj       cls       dgi     total   targets  img_size
     9/199       11G       nan       nan         0       nan       nan         9       512: 100%|█| 1800/1800 [13:20<00
               Class      Images     Targets           P           R      mAP@.5  mAP@.5:.95: 100%|█| 344/344 [00:45<00
29.580291509628296
                 all    2.75e+03           0           0           0           0           0

     Epoch   gpu_mem       box       obj       cls       dgi     total   targets  img_size
    10/199       11G       nan       nan         0       nan       nan         4       512: 100%|█| 1800/1800 [13:25<00
               Class      Images     Targets           P           R      mAP@.5  mAP@.5:.95: 100%|█| 344/344 [00:48<00
32.03327965736389
                 all    2.75e+03           0           0           0           0           0

     Epoch   gpu_mem       box       obj       cls       dgi     total   targets  img_size
    11/199       11G       nan       nan         0       nan       nan         9       512: 100%|█| 1800/1800 [13:28<00
               Class      Images     Targets           P           R      mAP@.5  mAP@.5:.95: 100%|█| 344/344 [00:47<00
30.341226816177368
                 all    2.75e+03           0           0           0           0           0

     Epoch   gpu_mem       box       obj       cls       dgi     total   targets  img_size
    12/199       11G       nan       nan         0       nan       nan         2       512: 100%|█| 1800/1800 [13:11<00
               Class      Images     Targets           P           R      mAP@.5  mAP@.5:.95: 100%|█| 344/344 [00:45<00
29.359901189804077
                 all    2.75e+03           0           0           0           0           0

     Epoch   gpu_mem       box       obj       cls       dgi     total   targets  img_size
    13/199       11G       nan       nan         0       nan       nan        13       512: 100%|█| 1800/1800 [13:05<00
               Class      Images     Targets           P           R      mAP@.5  mAP@.5:.95: 100%|█| 344/344 [00:45<00
29.436581134796143
                 all    2.75e+03           0           0           0           0           0

     Epoch   gpu_mem       box       obj       cls       dgi     total   targets  img_size
    14/199       11G       nan       nan         0       nan       nan         7       512: 100%|█| 1800/1800 [13:04<00
               Class      Images     Targets           P           R      mAP@.5  mAP@.5:.95: 100%|█| 344/344 [00:45<00
29.631073713302612
                 all    2.75e+03           0           0           0           0           0

     Epoch   gpu_mem       box       obj       cls       dgi     total   targets  img_size
    15/199       11G       nan       nan         0       nan       nan         6       512: 100%|█| 1800/1800 [13:08<00
               Class      Images     Targets           P           R      mAP@.5  mAP@.5:.95: 100%|█| 344/344 [00:45<00
29.1485652923584
                 all    2.75e+03           0           0           0           0           0

     Epoch   gpu_mem       box       obj       cls       dgi     total   targets  img_size
    16/199       11G       nan       nan         0       nan       nan        18       512: 100%|█| 1800/1800 [13:14<00
               Class      Images     Targets           P           R      mAP@.5  mAP@.5:.95: 100%|█| 344/344 [00:46<00
29.673731088638306
                 all    2.75e+03           0           0           0           0           0
WindVChen commented 1 year ago

There seems a gradient explosion (or something else) that lead to a NAN loss value. What about turning down the learning rate, or clip the gradient before optimizer.step() ?