affinelayer / pix2pix-tensorflow

Tensorflow port of Image-to-Image Translation with Conditional Adversarial Nets https://phillipi.github.io/pix2pix/
MIT License
5.07k stars 1.3k forks source link

Invalid argument: Nan in summary histogram for: generator/encoder_1/conv2d/kernel/values #190

Open meihuabo opened 4 years ago

meihuabo commented 4 years ago

Traceback (most recent call last): File "/home/pro/miniconda3/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call return fn(*args) File "/home/pro/miniconda3/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1350, in _run_fn target_list, run_metadata) File "/home/pro/miniconda3/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1443, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found. (0) Invalid argument: Nan in summary histogram for: generator/encoder_1/conv2d/kernel/values [[{{node generator/encoder_1/conv2d/kernel/values}}]] (1) Invalid argument: Nan in summary histogram for: generator/encoder_1/conv2d/kernel/values [[{{node generator/encoder_1/conv2d/kernel/values}}]] [[convert_inputs/convert_image/Minimum/_802]] 0 successful operations. 0 derived errors ignored.

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "pix2pix.py", line 815, in main() File "pix2pix.py", line 781, in main results = sess.run(fetches, options=options, run_metadata=run_metadata) File "/home/pro/miniconda3/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 956, in run run_metadata_ptr) File "/home/pro/miniconda3/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1180, in _run feed_dict_tensor, options, run_metadata) File "/home/pro/miniconda3/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1359, in _do_run run_metadata) File "/home/pro/miniconda3/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1384, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found. (0) Invalid argument: Nan in summary histogram for: generator/encoder_1/conv2d/kernel/values [[node generator/encoder_1/conv2d/kernel/values (defined at /home/pro/miniconda3/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py:1748) ]] (1) Invalid argument: Nan in summary histogram for: generator/encoder_1/conv2d/kernel/values [[node generator/encoder_1/conv2d/kernel/values (defined at /home/pro/miniconda3/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py:1748) ]] [[convert_inputs/convert_image/Minimum/_802]] 0 successful operations. 0 derived errors ignored.

Original stack trace for 'generator/encoder_1/conv2d/kernel/values': File "pix2pix.py", line 815, in main() File "pix2pix.py", line 709, in main tf.summary.histogram(var.op.name + "/values", var) File "/home/pro/miniconda3/lib/python3.7/site-packages/tensorflow_core/python/summary/summary.py", line 179, in histogram tag=tag, values=values, name=scope) File "/home/pro/miniconda3/lib/python3.7/site-packages/tensorflow_core/python/ops/gen_logging_ops.py", line 329, in histogram_summary "HistogramSummary", tag=tag, values=values, name=name) File "/home/pro/miniconda3/lib/python3.7/site-packages/tensorflow_core/python/framework/op_def_library.py", line 794, in _apply_op_helper op_def=op_def) File "/home/pro/miniconda3/lib/python3.7/site-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func return func(*args, **kwargs) File "/home/pro/miniconda3/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 3357, in create_op attrs, op_def, compute_device) File "/home/pro/miniconda3/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 3426, in _create_op_internal op_def=op_def) File "/home/pro/miniconda3/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 1748, in init self._traceback = tf_stack.extract_stack()

meihuabo commented 4 years ago

Generally, it is caused by the improper setting of the learning rate of the optimizer. We can try to use a smaller learning rate for training to solve this problem.

skabbit commented 4 years ago

Try tensorflow v1.14

jordan-bird commented 3 years ago

This happens if you run TF2.0 in compatibility mode, whereas the v1.14 release as mentioned by @skabbit works without the error. I can't see any major differences between the compatibility and TF1.14 learning rate implementations so I don't know where the issue is actually coming from.

I set up TF1.14 in a Conda environment and it hasn't crashed in 200 epochs. If you're using TF2.0 and have no virtual environments then a hacky solution is to loop the python command in a bash script and run that, just remember to decrease the model save interval so you don't lose too many steps once it crashes and restarts. On the bright side you can leave it running unattended without having to manually restart.