Does Residual-Attention-Network need a lot of GPU memory than reset-100?

bruinxiong commented 6 years ago

@IrvingShu Based on the paper of Residual Attention Network, attention 92 is larger than ResNext 101 and ResNet 100 on the number of parameters and FLOPs. In other words, it need more GPU memory. If you warry about the GPU memory, you can deploy MXNet's memonger or change to smaller one such as attention 56.

bruinxiong commented 6 years ago

Dear cavalleria,

Thank you for your attention. In my case, I use 8 GPU (Titan X with 12GB) with batch size 640=80x8. Simultaneously, I deploy memonger during training stage. So, you have two ways: 1 reduce batch size of each GPU 2. Use memonger during training stage

Thanks!

Best Regards,

Yours.

Xiong Lin Panasonic R&D Center Singapore Core Technology Group, Learning & Vision 202 Bedok South Avenue 1 #02-11 Singapore 469332 Mobile: +65 83752875 2018年9月4日 +0800 12:48 cavalleria notifications@github.com，写道：

@bruinxiong dataset: EMore network backbone: c116 ( res_unit=3, output=E, emb_size=256, prelu ), CRU_Net loss function: arcface(m=0.5) input size: 112x112 batch size: 80(4 p40(24G)) lr: 0.1 when i train c116，out of memory, batch size is 80x4=320 if i set batch size 40x4， about 20G memory used per gpu how is this going? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

cavalleria commented 6 years ago

@bruinxiong thanks for your reply. i deploy memonger during training phrase，now batch size is 4x160=640 and memory used ~19g per gpu.

bruinxiong / Modified-CRUNet-and-Residual-Attention-Network.mxnet

Does Residual-Attention-Network need a lot of GPU memory than reset-100? #2