Closed DuZzzs closed 3 years ago
Hello!
You could also increase the voxel size, which increases the dimensions of each voxel and therefore reduces the total number of voxels/BEV cells in the voxel/BEV grid. This can be adjusted by adjusting https://github.com/TRAILab/CaDDN/blob/master/tools/cfgs/kitti_models/CaDDN.yaml#L12.
Note that the number of cells in each dimension of the BEV grid (H, W) need to be divisible by 4 in order to be processed by the BEV backbone.
EDIT: I realized that this was 8 and not 4
@codyreading When I set VOXEL_SIZE: [0.64, 0.64, 0.64], error:
x = torch.cat(ups, dim=l)
RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 1. Got 48 and 47 tn dimension 2 at /pytorch/aten/src/THC/generic/THCTensorMath.cu:71
Am I missing something?Thank you.
The number should be divisible by 30.08, this time it can run normally. thank you very much,
No problem! Closing this issue then. Also FYI, expect a degradation in performance with an increased voxel size.
After I understand the code,I will try to copy mobilenetv3 from torchvision deeplabv3。 Thank you again.
what numbers are divisible by 30.08?
The number of cells in each dimension of the BEV grid (H, W) need to be divisible by 8 in order to be processed by the BEV backbone, due to the downsample/upsample by 8. If we look at the current BEV grid H.
H = (Y_max - Y_min) / voxel_size_y = (30.08 - (- 30.08)) / 0.16 = 376
H / 8 = 376 / 8 = 47
As long as that final number is a whole number, this should be able to be processed by the BEV backbone.
For example
H = (Y_max - Y_min) / voxel_size_y = (30 - (- 30)) / 0.5 = 120
H / 8 = 120 / 8 = 15
@codyreading @DuZzzs Hi, could you tell me how to change the setting? I change the voxel to [1.0, 1.0, 1.0] but still Out of memory.
@codyreading I set VOXEL_SIZE: [0.94, 0.94, 0.94]
, but when the training reaches epoch6, the program reports an error.
@czy341181 How much GPU memory do you have? And what batch size are you using? A voxel size of [1.0, 1.0, 1.0] should not consume too much memory with a lower batch size.
@DuZzzs What error are you reporting?
My GPU has 11178MB. When I set the voxel_size as [0.94, 0.94, 0.94], it runs for a while and then out of memory.
@codyreading I didn't record that error, it was probably cudnn's error. When I used OpenPcDet's spconv1.0 corresponding docker, cudnn error also appeared. I think my environment is inconsistent with the official documentation. I will check this error in cudnn later. Thank you very much。
请问你最后用什么方法减少的内存? @czy341181
Hi,I set batch_size = 2, and I can not train resnet50 on 2080ti due to out of memory.Do you have any way to reduce the memory usage of the code? I tried to reduce the number of blocks in resnet50,but loss=nan, and error:
Thank you.