AlexHex7 / Non-local_pytorch

Implementation of Non-local Block.
Apache License 2.0
1.57k stars 276 forks source link

About GPU memory usage #46

Open QAQEthan opened 3 years ago

QAQEthan commented 3 years ago

If non-local is applied to the low-level feature map, CUDA out of memory will happen.Is this due to the amount of memory required to compute the Attention matrix? Looking forward to your reply

buncybunny commented 3 years ago

I'm also experiencing CUDA out of memory issue with non-local block. I'm trying to use non-local block at the top of my network, which is for bbox regression conv head in faster r-cnn. Do you guys have any ideas to address this?

AlexHex7 commented 3 years ago

@Monkey-D-Luffy-star @vombategeht Hi~

The larger the size (height, width, depth) of feature maps is, the more memories the matrix multiplication will occupy.

When I encounter this problem,I will:

  1. reduce the batch size
  2. downsample the feature maps
  3. move non-local block to high-level position
  4. make some optimization. For example, follow the idea of papers: 4.1. GCNet:Non-local Networks Meet Squeeze-Excitation Networks and Beyond 4.2. Compact Generalized Non-local Network
  5. follow the idea of transformer block: split tokens (height x width x depth) in several groups, then do self-attention in each group.
  6. or directly try using transformer block
QAQEthan commented 3 years ago

@AlexHex7 Thx, benefit a lot.