Tencent / PocketFlow

An Automatic Model Compression (AutoMC) framework for developing smaller and faster AI applications.
https://pocketflow.github.io
Other
2.78k stars 490 forks source link

Why learner has hard-code about dataset, such as, different dataset uses different learning rate control policy. #23

Open backyes opened 5 years ago

backyes commented 5 years ago

E.g.

root@linuxkit-025000000001:/Users/test/work/PocketFlow/learners# grep cifar_10 * -r
nonuniform_quantization/bit_optimizer.py:      if self.dataset_name == 'cifar_10':
nonuniform_quantization/learner.py:  if dataset_name == 'cifar_10':
nonuniform_quantization/learner.py:        if self.dataset_name == 'cifar_10':
nonuniform_quantization/learner.py:        if self.dataset_name == 'cifar_10':
uniform_quantization/bit_optimizer.py:      if self.dataset_name == 'cifar_10':
uniform_quantization/learner.py:  if dataset_name == 'cifar_10':
uniform_quantization/learner.py:        if self.dataset_name == 'cifar_10':
uniform_quantization/learner.py:        if self.dataset_name == 'cifar_10':
uniform_quantization_tf/learner.py:        #if self.dataset_name == 'cifar_10':
weight_sparsification/pr_optimizer.py:    skip_head_n_tail = (self.dataset_name == 'cifar_10')  # skip head & tail layers on CIFAR-10

more details,

def setup_bnds_decay_rates(model_name, dataset_name):
  """ NOTE: The bnd_decay_rates here is mgw_size invariant """

  batch_size = FLAGS.batch_size if not FLAGS.enbl_multi_gpu else FLAGS.batch_size * mgw.size()
  nb_batches_per_epoch = int(FLAGS.nb_smpls_train / batch_size)
  mgw_size = int(mgw.size()) if FLAGS.enbl_multi_gpu else 1
  init_lr = FLAGS.lrn_rate_init * FLAGS.batch_size * mgw_size / FLAGS.batch_size_norm if FLAGS.enbl_multi_gpu else FLAGS.lrn_rate_init
  if dataset_name == 'cifar_10':
    if model_name.startswith('resnet'):
      bnds = [nb_batches_per_epoch * 15, nb_batches_per_epoch * 40]
      decay_rates = [1e-3, 1e-4, 1e-5]
  elif dataset_name == 'ilsvrc_12':
    if model_name.startswith('resnet'):
      bnds = [nb_batches_per_epoch * 5, nb_batches_per_epoch * 20]
      decay_rates = [1e-4, 1e-5, 1e-6]
    elif model_name.startswith('mobilenet'):
      bnds = [nb_batches_per_epoch * 5, nb_batches_per_epoch * 30]
      decay_rates = [1e-4, 1e-5, 1e-6]
  finetune_steps = nb_batches_per_epoch * FLAGS.uql_quant_epochs
  init_lr = init_lr if FLAGS.enbl_warm_start else FLAGS.lrn_rate_init
  return init_lr, bnds, decay_rates, finetune_steps

This design will have strong limitation for testing new dataset, I doubt. Wish some replies about this design.

Many thanks.

jiaxiang-wu commented 5 years ago

We are planning to remove the dependency on data set's name in UniformQuantLearner and NonUniformQuantLearner. For most scenarios, such hard code can be replaced by defining FLAGS variables for higher flexibility. Particularly, the setup_bnds_decay_rates function you listed can be removed by defining a decaying factor for learning rate, as we did in UniformQuantTFLearner. For WeightSparseLearner, the skip_head_n_tail variable can also be changed into a FLAGS variable.

jiaxiang-wu commented 5 years ago

Enhancement required: remove the hard code related to data set's names in learners' implementation.

backyes commented 5 years ago

Got it. Thanks your kind reply.