WangYZ1608 / Knowledge-Distillation-via-ND

The official implementation for paper: Improving Knowledge Distillation via Regularizing Feature Norm and Direction
13 stars 2 forks source link

use only 1 gpu per node? #5

Closed HanGuangXin closed 1 year ago

HanGuangXin commented 1 year ago

Hi, I'm not quite familiar with distributed training. But it seems the training script only use 1 gpu per node. What should I do if the node has 8 gpus?

HanGuangXin commented 1 year ago

@WangYZ1608 Sorry to bother, but do you have any hints?

WangYZ1608 commented 1 year ago

It works without modification.