About the memory occupied by different batch_size

Zhen-ao commented 4 years ago

Hi~@TiagoCortinhal When I tried to use your library to complete the experiment, I found a very strange problem: I use 4 GPUs to train the network by modifying the batch_size in salsanext.yml.

When batch_size == 24, I found that GPU memory does not take up much memory:

|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+==
|   0  Tesla P40           On   | 00000000:02:00.0 Off |                    0 |
| N/A   47C    P0   157W / 250W |   8841MiB / 22919MiB |     75%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla P40           On   | 00000000:03:00.0 Off |                    0 |
| N/A   50C    P0   160W / 250W |   8283MiB / 22919MiB |     77%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla P40           On   | 00000000:83:00.0 Off |                    0 |
| N/A   47C    P0   151W / 250W |   8273MiB / 22919MiB |     56%      Default |
+-------------------------------+----------------------+----------------------+
|   3  Tesla P40           On   | 00000000:84:00.0 Off |                    0 |
| N/A   46C    P0   146W / 250W |   8285MiB / 22919MiB |     79%      Default |
+-------------------------------+----------------------+----------------------+

When setting batch_size == 4, I found that the GPU memory is full:

-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+===
|   0  Tesla P40           On   | 00000000:02:00.0 Off |                    0 |
| N/A   42C    P0    69W / 250W |   2103MiB / 22919MiB |     98%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla P40           On   | 00000000:03:00.0 Off |                    0 |
| N/A   43C    P0    79W / 250W |   1981MiB / 22919MiB |     60%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla P40           On   | 00000000:83:00.0 Off |                    0 |
| N/A   41C    P0    90W / 250W |   1993MiB / 22919MiB |     86%      Default |
+-------------------------------+----------------------+----------------------+
|   3  Tesla P40           On   | 00000000:84:00.0 Off |                    0 |
| N/A   40C    P0   110W / 250W |   1981MiB / 22919MiB |     73%      Default |
+-------------------------------+----------------------+----------------------+

Hope you can answer my confusion Best wishes

TiagoCortinhal commented 4 years ago

Hello @123zhen123.

This seems odd indeed.

Can you check the allocated vs cached memory? With torch.cuda.memory_allocated() and torch.cuda.memory_cached(). Inside of the iteration loop.

Zhen-ao commented 4 years ago

Hello @123zhen123.

This seems odd indeed.

Can you check the allocated vs cached memory? With torch.cuda.memory_allocated() and torch.cuda.memory_cached(). Inside of the iteration loop.

Thank you, I have checked, but no problem is found, and this problem will not occur in the Semantic-kittitraining. May be my problem with defining the data-set

But when I tried to reproduce the results when I trained semantic-kitti, the results showed the following problems:

Lr: 7.035e-03 | Update: 2.509e-04 mean,1.493e-04 std | Epoch: [35][790/797] | Time 1.722 (1.706) | Data 0.086 (0.095) | Loss 0.5400 (0.6056) | acc 0.918 (0.911) | IoU 0.713 (0.645) | [2 days, 1:46:31]
Best mean iou in training set so far, save model!
| ********************************************************************************
| Validation set:
| Time avg per batch 0.798
| Loss avg 1.1969
| Jaccard avg 0.4161
| WCE avg 0.7808
| Acc avg 0.890
| IoU avg 0.555
| IoU class 0 [unlabeled] = 0.000
| IoU class 1 [car] = 0.932
| IoU class 2 [bicycle] = 0.266
| IoU class 3 [motorcycle] = 0.423
| IoU class 4 [truck] = 0.520
| IoU class 5 [other-vehicle] = 0.422
| IoU class 6 [person] = 0.595
| IoU class 7 [bicyclist] = 0.703
| IoU class 8 [motorcyclist] = 0.000
| IoU class 9 [road] = 0.932
| IoU class 10 [parking] = 0.468
| IoU class 11 [sidewalk] = 0.793
| IoU class 12 [other-ground] = 0.044
| IoU class 13 [building] = 0.850
| IoU class 14 [fence] = 0.540
| IoU class 15 [vegetation] = 0.823
| IoU class 16 [trunk] = 0.607
| IoU class 17 [terrain] = 0.664
| IoU class 18 [pole] = 0.513
| IoU class 19 [traffic-sign] = 0.452

This is the result of the 37th epoch. Although "IoU avg=0.555", "motorcyclist" is0.00 Similarly, I found that the results of the previous epoch iterations also have "motorcyclist" being 0.0 I don't know if this is a normal phenomenon during training, or is there any other trick problem?

TiagoCortinhal commented 4 years ago

This is the result of the 37th epoch. Although "IoU avg=0.555", "motorcyclist" is 0.00 Similarly, I found that the results of the previous epoch iterations also have "motorcyclist" being 0.0

Yes this is expected as motorcyclist is the rarest class in the entire dataset and the validation sequence doesn't have much examples. So it is not a good estimator for that class IoU.

Zhen-ao commented 4 years ago

This is the result of the 37th epoch. Although "IoU avg=0.555", "motorcyclist" is 0.00 Similarly, I found that the results of the previous epoch iterations also have "motorcyclist" being 0.0

Yes this is expected as motorcyclist is the rarest class in the entire dataset and the validation sequence doesn't have much examples. So it is not a good estimator for that class IoU.

Thank you very much for your patient answer, I think you have answered my confusion :)

TiagoCortinhal / SalsaNext

About the memory occupied by different batch_size #30