geffy / tffm

TensorFlow implementation of an arbitrary order Factorization Machine
MIT License
780 stars 176 forks source link

got NaN issue running on a sparse data #9

Closed VinceShieh closed 8 years ago

VinceShieh commented 8 years ago

Hi,

I tried to run TFFMClassifier on a sparse data, (for example: https://github.com/apache/spark/blob/master/data/mllib/sample_libsvm_data.txt), but got an error when fitting the data, the data is loaded with load_svmlight_file ========================= trace log ======================== tensorflow.python.framework.errors.InvalidArgumentError: NaN or Inf in target value : Tensor had NaN values [[Node: target/CheckNumerics = CheckNumericsT=DT_FLOAT, _class=["loc:@add"], message="NaN or Inf in target value", _device="/job:localhost/replica:0/task:0/cpu:0"]] Caused by op u'target/CheckNumerics', defined at: File "/usr/local/lib/python2.7/dist-packages/tffm/testtffm.py", line 51, in model.fit(X_tr.toarray(), y_tr, show_progress=True) File "/usr/local/lib/python2.7/dist-packages/tffm/tffm/base.py", line 242, in fit self.core.build_graph() File "/usr/local/lib/python2.7/dist-packages/tffm/tffm/core.py", line 208, in build_graph self.init_target() File "/usr/local/lib/python2.7/dist-packages/tffm/tffm/core.py", line 191, in init_target msg='NaN or Inf in target value', name='target') File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/numerics.py", line 42, in verify_tensor_all_finite

verify_input = array_ops.check_numerics(t, message=msg)

it seems the problem is in self.loss = self.loss_function(self.outputs, self.train_y), that somehow generated NaN points.

Can someone look at this issue? thanks.

Vimos commented 8 years ago

This because tf.exp(margins) in the loss function results with inf Maybe you can try to truncate the margins

    margins = tf.minimum(margins, 5)
    margins = tf.maximum(margins, -5)
VinceShieh commented 8 years ago

right~ that solved the problem. thanks.

geffy commented 8 years ago

Yes, @Vimos right. Usually, I get this error when have too big initialization std. Try to decrease init_std param in constructor.

SergeiGorbatiuk commented 6 years ago

Hello! Could you please explain a bit clearer the solution to this problem? I get this error almost every time, I tried to increase epsilon parameter in AdamOptimizer and decrease init_std, but this did not work out. I also did not get how to apply the explanation given by @Vimos. I would deeply appreciate if you could help me=)

lw00245 commented 5 years ago

Hello! Could you please explain a bit clearer the solution to this problem? I get this error almost every time, I tried to increase epsilon parameter in AdamOptimizer and decrease init_std, but this did not work out. I also did not get how to apply the explanation given by @Vimos. I would deeply appreciate if you could help me=)

in tffm/utils.py, there is a loss function with name "loss_logistic", you can modify the function as follows: margins = -y * tf.transpose(outputs) margins = tf.minimum(margins, 5) margins = tf.maximum(margins, -5) raw_loss = tf.log(tf.add(1.0, tf.exp(margins)))