About GPU memory usage - Githubissues

QAQEthan commented 3 years ago

If non-local is applied to the low-level feature map, CUDA out of memory will happen.Is this due to the amount of memory required to compute the Attention matrix? Looking forward to your reply

buncybunny commented 3 years ago

I'm also experiencing CUDA out of memory issue with non-local block. I'm trying to use non-local block at the top of my network, which is for bbox regression conv head in faster r-cnn. Do you guys have any ideas to address this?

AlexHex7 commented 3 years ago

@Monkey-D-Luffy-star @vombategeht Hi~

The larger the size (height, width, depth) of feature maps is, the more memories the matrix multiplication will occupy.

When I encounter this problem，I will：

reduce the batch size
downsample the feature maps
move non-local block to high-level position
make some optimization. For example, follow the idea of papers: 4.1. GCNet:Non-local Networks Meet Squeeze-Excitation Networks and Beyond 4.2. Compact Generalized Non-local Network
follow the idea of transformer block: split tokens (height x width x depth) in several groups, then do self-attention in each group.
or directly try using transformer block

QAQEthan commented 3 years ago

@AlexHex7 Thx, benefit a lot.

AlexHex7 / Non-local_pytorch

About GPU memory usage #46