cuda memory - Githubissues

facebookresearch / ContrastiveSceneContexts

Code for CVPR 2021 oral paper "Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts"

MIT License

221 stars 28 forks source link

cuda memory #33

Open zhangzihui247 opened 2 years ago

zhangzihui247 commented 2 years ago

Hi, thanks for this great work, I have a question, since I want to run the training code in my desktop. I use a single 3090 GPU. I saw you use 8 GPUs and set batch size to 32. If I want to run the pre-train code in my GPU, what's the batch size should I set? because I set it to 2 but still out of cuda memory.

Thanks, zihui

Sekunde commented 2 years ago

Hi,

we use v100 with 32gb mem that fits 4 samples; you can calculate accordingly to your gpu mem; another trick is you can make fewer sampling points such as 1024 to reduce the mem. Or you can accumulate the gradients, such as 2 times forward with 1 time backward to make the effective batch size large.

Note that larger batch size can contribute more to pre-training.

zhangzihui247 commented 2 years ago

you mean you put 4 scenes on one V100, and use 8 cards so totally batch size is 32？

Sekunde commented 2 years ago

exactly

zhangzihui247 commented 2 years ago

Thanks, and I have another problem now. When I run the ddp_main.py for S3DIS semantic segmentation training. I find, in line 85 in semseg/datasets/voxelizer.py, the variable 'lim' is 4. This 'lim' seems used for clip the point cloud. But, here, the range of 'coords' is very larger. For example, the ' bound_min' is [-387, 1537, -10] and 'bound_max' is [193, 1840, 326] (this range varies for different point cloud). So, use 4 as a radius to crop the point cloud will lead to an empty cropped point cloud. BTW, I use MinkowskiEngine 0.5.4, but I checked the 'ME.utils.sparse_quantize', which seems used for reading the input data, its computation method doesn't have differences with 0.4.3 version.

Sorry for disturbing you again

Sekunde commented 2 years ago

Hi, sry for the late response.

I cannot reproduce your errors in ME043. Please see my screenshot:

zhangzihui247 commented 2 years ago

Sorry, I got a mistake about it. And I still have 2 quesrions.

I find you concatenete the batch_idx in front of coords, can you tell me the reason?
when I run the codes, in line133 of downstream/semseg/models/modules/common.py, it will raise an error: AssertionError: Axis types must be None when region_type is given.

Thanks, zihui

Sekunde commented 2 years ago

Hi,

That is the requirement of ME to process the different number of points in a batch
if migrating to ME0.5, you need some adaptions; region_type is one of them.