diux-dev / cluster

train on AWS
75 stars 15 forks source link

ImageNet: scale_lr is confusing #54

Closed yaroslavvb closed 6 years ago

yaroslavvb commented 6 years ago

Can role of scale_lr be documented? This is the 18-min 16 machine configuration, and it's a bit counter-intuitive that it's achieved with scale_lr=8 @bearpelican

x16ar_args_benchmark = [
  '--phases', [
    {'ep':0,  'sz':128, 'bs':64, 'trndir':'-sz/160'},
    {'ep':(0,6),  'lr':(lr,lr*2)},
    {'ep':6,            'bs':128, 'keep_dl':True},
    {'ep':6,      'lr':lr*2},
    {'ep':16, 'sz':224,'bs':64}, # todo: increase this bs
    {'ep':16,      'lr':lr},
    {'ep':19,           'bs':192, 'keep_dl':True},
    {'ep':19,     'lr':2*lr/(10/1.5)},
    {'ep':31,     'lr':2*lr/(100/1.5)},
    {'ep':37, 'sz':288, 'bs':128, 'min_scale':0.5, 'use_ar':True},
    {'ep':37,     'lr':2*lr/100},
    {'ep':(38,40),'lr':2*lr/1000}
  ],
  '--init-bn0',
  '--no-bn-wd',
  '--scale-lr', 8, # 8 = num tasks
  '--num-tasks', 16,
  '--ami-name', 'pytorch.imagenet.source.v6',
  '--env-name', 'pytorch_source',
]
bearpelican commented 6 years ago

Removed - https://github.com/diux-dev/cluster/pull/69

yaroslavvb commented 6 years ago

16-machine setup still has scale-lr, what's the equivalent config after refactor? Currently it's

x16ar_args_benchmark = [
  '--phases', [
    {'ep':0,  'sz':128, 'bs':64, 'trndir':'-sz/160'},
    {'ep':(0,6),  'lr':(lr,lr*2)},
    {'ep':6,            'bs':128, 'keep_dl':True},
    {'ep':6,      'lr':lr*2},
    {'ep':16, 'sz':224,'bs':64}, # todo: increase this bs
    {'ep':16,      'lr':lr},
    {'ep':19,           'bs':192, 'keep_dl':True},
    {'ep':19,     'lr':2*lr/(10/1.5)},
    {'ep':31,     'lr':2*lr/(100/1.5)},
    {'ep':37, 'sz':288, 'bs':128, 'min_scale':0.5, 'rect_val':True},
    {'ep':37,     'lr':2*lr/100},
    {'ep':(38,40),'lr':2*lr/1000}
  ],
  '--init-bn0',
  '--no-bn-wd',
  '--scale-lr', 8, # 8 = num tasks
  '--num-tasks', 16,
  '--ami-name', DEFAULT_PYTORCH_SOURCE,
  '--env-name', 'pytorch_source',
]