HobbitLong / RepDistiller

[ICLR 2020] Contrastive Representation Distillation (CRD), and benchmark of recent knowledge distillation methods
BSD 2-Clause "Simplified" License
2.11k stars 389 forks source link

Memory issue about the NST LOSS #12

Closed leoozy closed 4 years ago

leoozy commented 4 years ago

Hi: Thank you for your code, I am trying to reimplement your benchmark. While running the nst loss, since the gram matrix is too large. Even if I use a 8 * P100(32GB) device, it is still out of memory. Could you please tell me whether you use some memory trick?Thank you

HobbitLong commented 4 years ago

For most of the combinations, it should be straight-forward to run NST. I am not sure which combination you are running for.

If it can not be fit into the memory, try to do: (1) setting full loss to false here (2) reducing batch size (and also learning rate) (3) using mixed-precision training.

HobbitLong commented 4 years ago

@leoozy , seems the problem gets solved? I am closing the issue now but feel free to reopen anytime.