Tencent / PocketFlow

An Automatic Model Compression (AutoMC) framework for developing smaller and faster AI applications.
https://pocketflow.github.io
Other
2.78k stars 490 forks source link

batch_size related #67

Closed backyes closed 5 years ago

backyes commented 5 years ago
learners/uniform_quantization_tf/learner.py:    nb_iters = int(np.ceil(float(FLAGS.nb_smpls_eval) / FLAGS.batch_size_eval))
learners/uniform_quantization_tf/learner.py:        images.set_shape((FLAGS.batch_size, images.shape[1], images.shape[2], images.shape[3]))
learners/uniform_quantization_tf/learner.py:    speed = FLAGS.batch_size * FLAGS.summ_step / time_step
learners/weight_sparsification/learner.py:    nb_iters = int(np.ceil(float(FLAGS.nb_smpls_eval) / FLAGS.batch_size))
learners/weight_sparsification/learner.py:    speed = FLAGS.batch_size * FLAGS.summ_step / time_step
learners/weight_sparsification/pr_optimizer.py:      nb_iters = FLAGS.nb_smpls_eval // FLAGS.batch_size_eval
learners/nonuniform_quantization/learner.py:  batch_size = FLAGS.batch_size if not FLAGS.enbl_multi_gpu else FLAGS.batch_size * mgw.size()
learners/nonuniform_quantization/learner.py:  nb_batches_per_epoch = int(FLAGS.nb_smpls_train / batch_size)
learners/nonuniform_quantization/learner.py:  init_lr = FLAGS.lrn_rate_init * FLAGS.batch_size * mgw_size / FLAGS.batch_size_norm \
learners/nonuniform_quantization/learner.py:    nb_iters = int(np.ceil(float(FLAGS.nb_smpls_eval) / FLAGS.batch_size_eval))
learners/nonuniform_quantization/learner.py:        images.set_shape((FLAGS.batch_size, images.shape[1], images.shape[2], \
learners/nonuniform_quantization/learner.py:        images.set_shape((FLAGS.batch_size, images.shape[1], images.shape[2], \
learners/nonuniform_quantization/learner.py:    speed = FLAGS.batch_size * FLAGS.summ_step / (timer() - time_prev)
learners/nonuniform_quantization/bit_optimizer.py:    nb_iters = FLAGS.nb_smpls_eval // FLAGS.batch_size_eval
learners/nonuniform_quantization/bit_optimizer.py:    speed = FLAGS.batch_size * self.tune_global_disp_steps / (timer() - time_prev)
learners/uniform_quantization/learner.py:  batch_size = FLAGS.batch_size if not FLAGS.enbl_multi_gpu else FLAGS.batch_size * mgw.size()
learners/uniform_quantization/learner.py:  nb_batches_per_epoch = int(FLAGS.nb_smpls_train / batch_size)
learners/uniform_quantization/learner.py:  init_lr = FLAGS.lrn_rate_init * FLAGS.batch_size * mgw_size / FLAGS.batch_size_norm if FLAGS.enbl_multi_gpu else FLAGS.lrn_rate_init
learners/uniform_quantization/learner.py:    nb_iters = int(np.ceil(float(FLAGS.nb_smpls_eval) / FLAGS.batch_size_eval))
learners/uniform_quantization/learner.py:        images.set_shape((FLAGS.batch_size, images.shape[1], images.shape[2],
learners/uniform_quantization/learner.py:        images.set_shape((FLAGS.batch_size, images.shape[1], images.shape[2],
learners/uniform_quantization/learner.py:    speed = FLAGS.batch_size * FLAGS.summ_step / (timer() - time_prev)
learners/uniform_quantization/bit_optimizer.py:    nb_iters = FLAGS.nb_smpls_eval // FLAGS.batch_size_eval
learners/uniform_quantization/bit_optimizer.py:    speed = FLAGS.batch_size * self.tune_global_disp_steps / (timer() - time_prev)
learners/distillation_helper.py:    nb_iters = int(np.ceil(float(FLAGS.nb_smpls_eval) / FLAGS.batch_size_eval))
learners/full_precision/learner.py:    nb_iters = int(np.ceil(float(FLAGS.nb_smpls_eval) / FLAGS.batch_size))
learners/full_precision/learner.py:    speed = FLAGS.batch_size * FLAGS.summ_step / time_step
learners/discr_channel_pruning/learner.py:    nb_iters = int(np.ceil(float(FLAGS.nb_smpls_eval) / FLAGS.batch_size_eval))
learners/discr_channel_pruning/learner.py:    speed = FLAGS.batch_size * FLAGS.summ_step / time_step
learners/channel_pruning/learner.py:    nb_iters = FLAGS.nb_smpls_eval // FLAGS.batch_size_eval
learners/channel_pruning/learner.py:        image_shape[0] = FLAGS.batch_size
learners/channel_pruning/learner.py:        label_shape[0] = FLAGS.batch_size
learners/channel_pruning/learner.py:    speed = FLAGS.batch_size * FLAGS.summ_step / (timer() - self.time_prev)
learners/channel_pruning/model_wrapper.py:        flops = flops / 2. / FLAGS.batch_size
learners/channel_pruning/channel_pruner.py:    nb_points_per_batch = FLAGS.cp_nb_points_per_layer * FLAGS.batch_size

When I tuning distributed training performance, try to go deep insight to stat performance, batch_size is critical variable.

Code shows, full_precision learner use batch_size from command line, means total batch size is nb_gpus * batch_size for synchronized sgd. However in other learner, batch_size is multipled by nb_gpus, and used to set some hyper parameter carefully.

To clarify these key point, can community give some more high level comments on 'batch_size',

jiaxiang-wu commented 5 years ago

For all learners, the actual batch size, or number of training samples used for gradient computation and averaging, equals to FLAGS.batch_size times the number of GPUs. If you find any inconsistent implementation, please tell us so that we can fix it.