NVIDIA / DIGITS

Deep Learning GPU Training System
https://developer.nvidia.com/digits
BSD 3-Clause "New" or "Revised" License
4.12k stars 1.38k forks source link

Should test_iter depend on the number of GPUs? #1604

Open RSly opened 7 years ago

RSly commented 7 years ago

Hi,

I wonder if the automatic calculation of digits for test_iter is correct.

Imagine we have a train/val set of 1000/150. there are following cases:

  1. train with TEST batch size 1 on 1 gpu => test_iter = 150 as in digits
  2. train with TEST batch_size 5 on 1 gpu => test_iter = 150/5= 30 as in digits
  3. train with TEST batch_size 5 on 3 gpus => test_iter = 150/5/3=10, but here digits again uses 30 and not 10

I am missing something for the 3rd case with 3 gpus? what is correct test_iter=10 or 30 ?

lukeyeager commented 7 years ago

Older versions of BVLC/caffe and NVIDIA/caffe run the TEST phase only in single-GPU mode (even if training is multi-GPU). I'm not sure about newer versions.

RSly commented 7 years ago

I ran a small benchmark, it seems that caffe 0.16 multi-gpu does the test phase also multi-gpu (which is cool)

so the test_iter calculation of digits should be updated to consider the number of GPUs. /cc @drnikolaev maybe you know best?

drnikolaev commented 7 years ago

@RSly yes, test phase is now multi-GPU