JianGoForIt / YellowFin

auto-tuning momentum SGD optimizer
Apache License 2.0
422 stars 93 forks source link

LinAlgError (Array must not contain infs or NaNs) thrown in get_mu_tensor #15

Open ywchan2005 opened 7 years ago

ywchan2005 commented 7 years ago

Below is a simple piece of code to try YellowFin on my dataset.

x = tf.placeholder( tf.float32, [ None, train_x.shape[ 1 ] ] )
y = tf.placeholder( tf.float32, [ None, train_y.shape[ 1 ] ] )
m = tf.layers.dense( x, hidden_dim )
m = tf.layers.batch_normalization( m )
m = tf.nn.elu( m )
m = tf.layers.dense( m, hidden_dim )
m = tf.layers.batch_normalization( m )
m = tf.nn.elu( m )
m = tf.layers.dense( m, hidden_dim )
m = tf.layers.batch_normalization( m )
m = tf.nn.elu( m )
m = tf.layers.dense( m, train_y.shape[ 1 ] )
prediction = tf.nn.softmax( m )
loss = tf.reduce_mean( tf.nn.softmax_cross_entropy_with_logits( labels=y, logits=m ) )
optimizer = yellowfin.YFOptimizer().minimize( loss )

s = tf.Session()
s.run( tf.global_variables_initializer() )
for epoch in range( epochs ):
    _, h = s.run( [ optimizer, loss ], feed_dict={ x: train_x, y: train_y } )

Usually, it crashes and throws the following exception.

Caused by op 'update_hyper/cond/PyFuncStateless', defined at:
  File "test2.py", line 47, in <module>
    optimizer = yf.YFOptimizer( learning_rate=1., momentum=0. ).minimize( loss )
  File "/data/python-mp-test/libs/yellowfin.py", line 268, in minimize
    return self.apply_gradients(grads_and_vars)
  File "/data/python-mp-test/libs/yellowfin.py", line 223, in apply_gradients
    update_hyper_op = self.update_hyper_param()
  File "/data/python-mp-test/libs/yellowfin.py", line 191, in update_hyper_param
    lambda: self._mu_var) )
  File "/usr/lib/python3.5/site-packages/tensorflow/python/util/deprecation.py", line 289, in new_func
    return func(*args, **kwargs)
  File "/usr/lib/python3.5/site-packages/tensorflow/python/ops/control_flow_ops.py", line 1814, in cond
    orig_res_t, res_t = context_t.BuildCondBranch(true_fn)
  File "/usr/lib/python3.5/site-packages/tensorflow/python/ops/control_flow_ops.py", line 1689, in BuildCondBranch
    original_result = fn()
  File "/data/python-mp-test/libs/yellowfin.py", line 190, in <lambda>
    self._mu = tf.identity(tf.cond(self._do_tune, lambda: self.get_mu_tensor(),
  File "/data/python-mp-test/libs/yellowfin.py", line 173, in get_mu_tensor
    roots = tf.py_func(np.roots, [coef], Tout=tf.complex64, stateful=False)
  File "/usr/lib/python3.5/site-packages/tensorflow/python/ops/script_ops.py", line 201, in py_func
    input=inp, token=token, Tout=Tout, name=name)
  File "/usr/lib/python3.5/site-packages/tensorflow/python/ops/gen_script_ops.py", line 56, in _py_func_stateless
    Tout=Tout, name=name)
  File "/usr/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
    op_def=op_def)
  File "/usr/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 2506, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/usr/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1269, in __init__
    self._traceback = _extract_stack()

UnknownError (see above for traceback): LinAlgError: Array must not contain infs or NaNs
     [[Node: update_hyper/cond/PyFuncStateless = PyFuncStateless[Tin=[DT_FLOAT], Tout=[DT_COMPLEX64], token="pyfunc_0", _device="/job:localhost/replica:0/task:0/cpu:0"](update_hyper/cond/ScatterUpdate)]]
JianGoForIt commented 7 years ago

Hi @ywchan2005,

Thanks for trying out the optimizer. This is mostly because of the exploding gradient in the middle of training.

  1. If it happens in the very beginning, you might want to play with the initial value a bit.

  2. If it is in the middle of training, please consider using gradient clipping. There is discussion with solutions in our PyTorch YellowFin repo here. Similar solution can apply to the TF repo.

  3. We are working on a better auto gradient clipping feature. You may also wait for that in a few days. But I suggest you can already start working on 2.

ywchan2005 commented 7 years ago

Thanks for the update. It happens mostly in the middle of training.

If I change the model to use relu (instead of elu), the issue doesn't happen anymore.

IdanAzuri commented 7 years ago

The same happened to me, first, you can a debug message to see exactly the values of the coefficients (where it gets the nan value) you can replace these lines in the get_mu_tensor(self) method: coef = tf.scatter_update(coef, tf.constant(2), -(3 + const_fact)) log_coef = tf.Print(input_=coef, data=[coef],message="coefficients: ") roots = tf.py_func(np.roots, [log_coef], Tout=tf.complex64, stateful=False)

Second, as @JianGoForIt suggested using clip_thresh parameter works for me.

However, another issue pops up, I get very high learning rate in the early stages in the training. That happens even in the case when my initial learning rate is very low (1e-5). The result is that the model diverges. Any ideas how to fix it?

JianGoForIt commented 7 years ago

Hi @IdanAzuri, Thanks for trying out our optimizer.

The estimators might be unstable when there are only a few iterations (as samples for the estimators in our optimizer). There are hacky ways to get around (clipping gradient / get a upper bound for lr in the begining iterations) but I would like to investigate whether there is systematical issues.

Could you please provide me a minimal example to reproduce your error?

IdanAzuri commented 7 years ago

@JianGoForIt, actually it's not stable after 400k iterations (until I stopped it) so it is not the problem. Sorry but I don't have a minimal example because my system is very complex with multiple subnets. clipping gradient / get an upper bound for learning rate doesn't work here, it still diverges.

staticfloat commented 6 years ago

I can confirm that I ran into this using a standard AlexNet architecture being trained on the ImageNet corpus using PyTorch. After 7 full epochs, (that is, having trained on 60928 minibatches, each of size 64) I received the following error:

yellowfin.py:192: RuntimeWarning: invalid value encountered in add
  self._grad_var += global_state['grad_norm_squared_avg'] / debias_factor
/var/storage/shared/msrlabs/sabae/libsmolder_autodeploy/libsmolder/optimizers/yellowfin.py:329: RuntimeWarning: invalid value encountered in double_scalars
  self._mu_t = max(root**2, ( (np.sqrt(dr) - 1) / (np.sqrt(dr) + 1) )**2 )

It would be really nice to have gradient clipping or some kind of workaround for this built-in to YellowFin. :)