Open IrvingShu opened 6 years ago
Dear cavalleria,
Thank you for your attention. In my case, I use 8 GPU (Titan X with 12GB) with batch size 640=80x8. Simultaneously, I deploy memonger during training stage. So, you have two ways: 1 reduce batch size of each GPU 2. Use memonger during training stage
Thanks!
Best Regards,
Yours.
Xiong Lin Panasonic R&D Center Singapore Core Technology Group, Learning & Vision 202 Bedok South Avenue 1 #02-11 Singapore 469332 Mobile: +65 83752875 2018年9月4日 +0800 12:48 cavalleria notifications@github.com,写道:
@bruinxiong dataset: EMore network backbone: c116 ( res_unit=3, output=E, emb_size=256, prelu ), CRU_Net loss function: arcface(m=0.5) input size: 112x112 batch size: 80(4 p40(24G)) lr: 0.1 when i train c116,out of memory, batch size is 80x4=320 if i set batch size 40x4, about 20G memory used per gpu how is this going? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
@bruinxiong thanks for your reply. i deploy memonger during training phrase,now batch size is 4x160=640 and memory used ~19g per gpu.
@IrvingShu Based on the paper of Residual Attention Network, attention 92 is larger than ResNext 101 and ResNet 100 on the number of parameters and FLOPs. In other words, it need more GPU memory. If you warry about the GPU memory, you can deploy MXNet's memonger or change to smaller one such as attention 56.