Open Jayku88 opened 1 year ago
May I know if you modify the code or config file?
Yes, the following are the modifications done in config file config/semantic_kitti/semantic_kitti_unet32_spherical_transformer.yaml
I wonder NaN happens because the batch_size
is too small, since only a single GPU is used. Can you try to use more GPUs for training?
[09/14 17:15:17 main-logger]: Epoch: [1/2][1310/19130] Data 0.001 (0.002) Batch 1.075 (1.127) Remain 11:33:44 Loss 1.0926 Lr: [0.00581479, 0.00058148] Accuracy 0.6878. NaN or Inf found in input tensor. [09/14 17:15:27 main-logger]: Epoch: [1/2][1320/19130] Data 0.001 (0.002) Batch 1.000 (1.126) Remain 11:33:00 Loss nan Lr: [0.00581338, 0.00058134] Accuracy 0.0689. NaN or Inf found in input tensor. [09/14 17:15:37 main-logger]: Epoch: [1/2][1330/19130] Data 0.001 (0.002) Batch 0.932 (1.125) Remain 11:32:20 Loss 0.6963 Lr: [0.00581196, 0.0005812] Accuracy 0.7945. NaN or Inf found in input tensor. [09/14 17:15:47 main-logger]: Epoch: [1/2][1340/19130] Data 0.001 (0.002) Batch 0.963 (1.124) Remain 11:31:31 Loss 1.0028 Lr: [0.00581054, 0.00058105] Accuracy 0.6936. NaN or Inf found in input tensor. [09/14 17:15:57 main-logger]: Epoch: [1/2][1350/19130] Data 0.001 (0.002) Batch 0.900 (1.123) Remain 11:30:41 Loss 1.0634 Lr: [0.00580913, 0.00058091] Accuracy 0.6222. NaN or Inf found in input tensor. NaN or Inf found in input tensor. [09/14 17:16:08 main-logger]: Epoch: [1/2][1360/19130] Data 0.001 (0.002) Batch 1.074 (1.123) Remain 11:30:32 Loss 0.8774 Lr: [0.00580771, 0.00058077] Accuracy 0.7543. NaN or Inf found in input tensor.