Open IrvingShu opened 6 years ago
Yes, so please deploy "memonger" when you training your model.
How to deploy "memonger", just add "export MXNET_BACKWARD_DO_MIRROR=1" ?
Is your num_classes 140000???? I only have about 1/10 num_classes, but take the same memory as your training process.
I have tested Crunet56,112 , however, they need a lot of gpu memory. For example: num_classes: 140000 init crunet 56 Called with argument: Namespace(backlight=0, batch_size=60)
| 0 Tesla V100-PCIE... Off | 00000000:04:00.0 Off | 0 | | N/A 47C P0 63W / 250W | 15956MiB / 16152MiB | 81% Default | +-------------------------------+----------------------+----------------------+ | 1 Tesla V100-PCIE... Off | 00000000:89:00.0 Off | 0 | | N/A 48C P0 52W / 250W | 16008MiB / 16152MiB | 85% Default | +-------------------------------+----------------------+----------------------+