WangYueFt / dcp

346 stars 90 forks source link

run out of memory when training DCP-v2 #1

Closed amiltonwong closed 5 years ago

amiltonwong commented 5 years ago

Hi, @WangYueFt ,

I can DCP-v1 without any issues. However, when I run python main.py --exp_name=dcp_v2 --model=dcp --emb_nn=dgcnn --pointer=transformer --head=svd, I got the following GPU memory error

  File "/root/anaconda3/envs/pytorch1.0/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/data/code9/dcp/model.py", line 232, in forward
    dropout=self.dropout)
  File "/data/code9/dcp/model.py", line 27, in attention
    scores = torch.matmul(query, key.transpose(-2, -1).contiguous()) / math.sqrt(d_k)
RuntimeError: CUDA out of memory. Tried to allocate 512.00 MiB (GPU 0; 11.91 GiB total capacity; 10.80 GiB already allocated; 5.44 MiB free; 64.97 MiB cached)

How large GPU memory is required to train dcp_v2? According to your paper, GTX 1070 GPU (8GB) is used. But my system uses Titan XP (12GB).

THX!

WangYueFt commented 5 years ago

Hi, @WangYueFt ,

I can DCP-v1 without any issues. However, when I run python main.py --exp_name=dcp_v2 --model=dcp --emb_nn=dgcnn --pointer=transformer --head=svd, I got the following GPU memory error

  File "/root/anaconda3/envs/pytorch1.0/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/data/code9/dcp/model.py", line 232, in forward
    dropout=self.dropout)
  File "/data/code9/dcp/model.py", line 27, in attention
    scores = torch.matmul(query, key.transpose(-2, -1).contiguous()) / math.sqrt(d_k)
RuntimeError: CUDA out of memory. Tried to allocate 512.00 MiB (GPU 0; 11.91 GiB total capacity; 10.80 GiB already allocated; 5.44 MiB free; 64.97 MiB cached)

How large GPU memory is required to train dcp_v2? According to your paper, GTX 1070 GPU (8GB) is used. But my system uses Titan XP (12GB).

THX!

Hi,

DCP-v2 was trained on two Tesla P100 while the inference time was tested on a single GTX 1070. You can reduce the batch size or use DCP-v1 with single Titan XP.

Best, Yue

amiltonwong commented 5 years ago

Thanks, Yue, I change it to --batch_size=8 (which consumes around 7.3GB GPU memory), and the training can proceed.