CUDA out of memory during training before evaluation

hiyyg commented 4 years ago

Hi @m-niemeyer , I tried to train with configs/single_view_reconstruction/multi_view_supervision/ours_combined.yaml, I have reduced the batch size of training and testing to 16. However, during training, every time before evaluation, the runtime error occurred:

RuntimeError: CUDA out of memory. Tried to allocate 7.06 GiB (GPU 0; 10.76 GiB total capacity; 7.46 GiB already allocated; 2.06 GiB free; 7.83 GiB reserved in
total by PyTorch)

What could be the problem?

hiyyg commented 4 years ago

I can only run evaluation with batch size <= 2, why does evaluation cost so much memory?

m-niemeyer commented 4 years ago

Hi @hiyyg , thanks a lot for your interest. First, the configs for the large models are optimized for GPUs with 32GB memory - Here some ideas how to reduce the memory load:

Reduce Training and Validation batch size in the config, e.g.
```
training:
batch_size: 16
batch_size_val: 4
```
Reduce the maximum number of points processed in parallel in the depth prediction step, e.g.
```
model:
depth_function_kwargs:
max_points: 10000
```
Reduce the hidden dimension of the model for training new models. A smaller model also trains much faster. Set e.g.
```
model:
decoder_kwargs:
hidden_size: 128
```
Reduce the number of training and evaluation points, e.g.
```
training:
n_training_points: 512
n_eval_points: 512
```

I hope this helps and you can find a setting which is suitable for your hardware! Regarding your question, the validation step requires more GPU memory in the early stages of training because we adaptively increase the ray sampling resolution and start with a small one (16) which is increased over time (up to 128). However, the validation step is always performed on the high resolution (128). You can see this in the depth function implementation.

hiyyg commented 4 years ago

Thanks for your reply. Does that mean the largest training batch size should be set to the same as testing batch size, otherwise training will be out of memory later?

m-niemeyer commented 4 years ago

You are right that the memory consumption will increase during training later, but it is not exactly the same as needed for the validation step; It also depends on the number of training / validation points you use (see Point 4.) from before). All mentioned points in the previous message can be used to reduce the memory load for both the training and testing step. Good luck with your research!

hiyyg commented 4 years ago

Hi @m-niemeyer , may I ask for the single-view reconstruction with multi-view supervision on the 3D-R2N2 Shapenet dataset, what is the final evaluation loss values of your model when the it is converged?

m-niemeyer commented 4 years ago

Hi @hiyyg , it should be loss_depth_eval: 0.033 for our 2.5D supervised model and mask_intersection: 0.973 for our 2D supervised model. I hope this is what you were looking for!

autonomousvision / differentiable_volumetric_rendering

CUDA out of memory during training before evaluation #9