Closed Zhen-ao closed 4 years ago
Hello @123zhen123.
This seems odd indeed.
Can you check the allocated vs cached memory? With torch.cuda.memory_allocated()
and torch.cuda.memory_cached()
. Inside of the iteration loop.
Hello @123zhen123.
This seems odd indeed.
Can you check the allocated vs cached memory? With
torch.cuda.memory_allocated()
andtorch.cuda.memory_cached()
. Inside of the iteration loop.
Thank you, I have checked, but no problem is found, and this problem will not occur in the Semantic-kitti
training.
May be my problem with defining the data-set
But when I tried to reproduce the results when I trained semantic-kitti
, the results showed the following problems:
Lr: 7.035e-03 | Update: 2.509e-04 mean,1.493e-04 std | Epoch: [35][790/797] | Time 1.722 (1.706) | Data 0.086 (0.095) | Loss 0.5400 (0.6056) | acc 0.918 (0.911) | IoU 0.713 (0.645) | [2 days, 1:46:31]
Best mean iou in training set so far, save model!
| ********************************************************************************
| Validation set:
| Time avg per batch 0.798
| Loss avg 1.1969
| Jaccard avg 0.4161
| WCE avg 0.7808
| Acc avg 0.890
| IoU avg 0.555
| IoU class 0 [unlabeled] = 0.000
| IoU class 1 [car] = 0.932
| IoU class 2 [bicycle] = 0.266
| IoU class 3 [motorcycle] = 0.423
| IoU class 4 [truck] = 0.520
| IoU class 5 [other-vehicle] = 0.422
| IoU class 6 [person] = 0.595
| IoU class 7 [bicyclist] = 0.703
| IoU class 8 [motorcyclist] = 0.000
| IoU class 9 [road] = 0.932
| IoU class 10 [parking] = 0.468
| IoU class 11 [sidewalk] = 0.793
| IoU class 12 [other-ground] = 0.044
| IoU class 13 [building] = 0.850
| IoU class 14 [fence] = 0.540
| IoU class 15 [vegetation] = 0.823
| IoU class 16 [trunk] = 0.607
| IoU class 17 [terrain] = 0.664
| IoU class 18 [pole] = 0.513
| IoU class 19 [traffic-sign] = 0.452
This is the result of the 37th epoch
. Although "IoU avg=0.555
", "motorcyclist
" is0.00
Similarly, I found that the results of the previous epoch iterations also have "motorcyclist
" being 0.0
I don't know if this is a normal phenomenon during training, or is there any other trick problem?
This is the result of the 37th epoch. Although "IoU avg=0.555", "motorcyclist" is 0.00 Similarly, I found that the results of the previous epoch iterations also have "motorcyclist" being 0.0
Yes this is expected as motorcyclist is the rarest class in the entire dataset and the validation sequence doesn't have much examples. So it is not a good estimator for that class IoU.
This is the result of the 37th epoch. Although "IoU avg=0.555", "motorcyclist" is 0.00 Similarly, I found that the results of the previous epoch iterations also have "motorcyclist" being 0.0
Yes this is expected as motorcyclist is the rarest class in the entire dataset and the validation sequence doesn't have much examples. So it is not a good estimator for that class IoU.
Thank you very much for your patient answer, I think you have answered my confusion :)
Hi~@TiagoCortinhal When I tried to use your library to complete the experiment, I found a very strange problem: I use
4 GPUs
to train the network by modifying thebatch_size
insalsanext.yml
.When
batch_size == 24
, I found that GPU memory does not take up much memory:When setting
batch_size == 4
, I found that the GPU memory is full:Hope you can answer my confusion Best wishes