Allocator (GPU_0_bfc) ran out of memory

zhouhao957 commented 4 years ago

Hello, I'm very sorry that I encountered some problems. Allocator (GPU_0_bfc) ran out of memory trying to allocate 6.36GiB. Current allocation summary follows. ........ ................. Limit: 10508668109 InUse: 8296583424 MaxInUse: 9055883776 NumAllocs: 1190 MaxAllocSize: 6826283264 This may be insufficient memory（GPU） _situation:_ I slightly modified your code so that it uses my own data ，several inputs have been added，and subsampling_parameter=0 batch_num=1, The above error occurs when running here （trainer.py） _, L_out, L_reg, L_p, probs, labels, acc = self.sess.run(ops, {model.dropout_prob: 0.5}) My computer : RTX2080TI, Video Memory:11g. CPU:intel i9 ,Memory:16g TensorFlow 1.12.0, CUDA 9.0 and cuDNN 7.4 I don't want to buy any more graphics cards or reduce the depth of network Can you give me some suggestions to reduce the GPU memory usage of the program? thank you!!

HuguesTHOMAS commented 4 years ago

Hi @zhouhao957,

So if I understand correctly you do not subsample the input point cloud, this not very memory-safe because if your input have high density areas, then some of your convolution neighborhoods will be too big.

You can try to reduce the input radius, the convolution radius, or introduce some subsampling to reduce the number of points. Alternatively you can try to reduce the number of features the network use (first_features_dim) and use a smaller networks with fewer layers.

Hope this help!

vvaibhav08 commented 4 years ago

Hi @HuguesTHOMAS,

I come across the OOM error every now and then during predicting. I am dealing with point clouds which have much higher density of points compared to NPM3D for example. Apart from lowering the batch size or increasing the first sub-sampling parameter, do you think lowering the percentile limit for "n_max" from 80 percentile (keep_ratio=0.8 in the calibrate_neighbors function in common.py) which you explained here in this comment on calibrate neighbors would help without any significant loss of accuracy?

I think I get these errors precisely because some regions in my dataset are way too dense so the prediction runs fine until it encounters such an area.

Thanks for sharing your awesome work btw!

HuguesTHOMAS commented 4 years ago

Hi @vvaibhav08,

You are right that this parameter is the last one that could have an influence on the memory consumption after batch size and first-sub-sampling / input-radius. In my own experiments, I found that a keep_ratio=0.8 is very effective even on extremely uneven dataset like Semantic3D. You could try to lower it even further, but I don't think this would help a lot more that it does already. The reason is simple: you can see the distribution of the neighborhood sizes in your dataset and it will usually look like the right side of a Gaussian. Something with a shape like this:

The 20% largest neighborhoods are the area in red and the new n_max will be nearly divided by two. The next 20% biggest neighborhood are in green and as you can see the new value of n_max if you take the 60th percentile is very close to the 80th percentile. This is why in my opinion you wont gain much by lowering the keep_ratio parameters.

Anyway I have another question for you that might solve you problem. I understand you are using you own data. Are you subsampling it before feeding it to the network? Because the first subsampling ratio is not applied automatically. The first layer of the network assumes that the data was already subsampled before. So if you don't subsample your data, that could explain you OOM errors. This is the job of the function load_subsampled_clouds that I have in all my datasets. If you are curious, I also gave a link to my SemanticKitti implementation where I do this subsampling online on each input cloud before feeding it to the input generator.

I hope this helps.

Best, Hugues

vvaibhav08 commented 4 years ago

@HuguesTHOMAS I do subsample the data using the grid_subsampling function in the dataset script. It's just that my data is too dense in some cases which leads to memory issue. It would be great if you could share the link anyways. It would help me understand subsampling better.

Thanks!

HuguesTHOMAS commented 4 years ago

https://drive.google.com/open?id=12npkHHnqzhhl5i-2q_RD-Cw_urUdWC0J

HuguesTHOMAS / KPConv

Allocator (GPU_0_bfc) ran out of memory #90