Open meihuabo opened 4 years ago
Generally, it is caused by the improper setting of the learning rate of the optimizer. We can try to use a smaller learning rate for training to solve this problem.
Try tensorflow v1.14
This happens if you run TF2.0 in compatibility mode, whereas the v1.14 release as mentioned by @skabbit works without the error. I can't see any major differences between the compatibility and TF1.14 learning rate implementations so I don't know where the issue is actually coming from.
I set up TF1.14 in a Conda environment and it hasn't crashed in 200 epochs. If you're using TF2.0 and have no virtual environments then a hacky solution is to loop the python command in a bash script and run that, just remember to decrease the model save interval so you don't lose too many steps once it crashes and restarts. On the bright side you can leave it running unattended without having to manually restart.
Traceback (most recent call last): File "/home/pro/miniconda3/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call return fn(*args) File "/home/pro/miniconda3/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1350, in _run_fn target_list, run_metadata) File "/home/pro/miniconda3/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1443, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found. (0) Invalid argument: Nan in summary histogram for: generator/encoder_1/conv2d/kernel/values [[{{node generator/encoder_1/conv2d/kernel/values}}]] (1) Invalid argument: Nan in summary histogram for: generator/encoder_1/conv2d/kernel/values [[{{node generator/encoder_1/conv2d/kernel/values}}]] [[convert_inputs/convert_image/Minimum/_802]] 0 successful operations. 0 derived errors ignored.
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "pix2pix.py", line 815, in
main()
File "pix2pix.py", line 781, in main
results = sess.run(fetches, options=options, run_metadata=run_metadata)
File "/home/pro/miniconda3/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 956, in run
run_metadata_ptr)
File "/home/pro/miniconda3/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1180, in _run
feed_dict_tensor, options, run_metadata)
File "/home/pro/miniconda3/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1359, in _do_run
run_metadata)
File "/home/pro/miniconda3/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1384, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: Nan in summary histogram for: generator/encoder_1/conv2d/kernel/values
[[node generator/encoder_1/conv2d/kernel/values (defined at /home/pro/miniconda3/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]
(1) Invalid argument: Nan in summary histogram for: generator/encoder_1/conv2d/kernel/values
[[node generator/encoder_1/conv2d/kernel/values (defined at /home/pro/miniconda3/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]
[[convert_inputs/convert_image/Minimum/_802]]
0 successful operations.
0 derived errors ignored.
Original stack trace for 'generator/encoder_1/conv2d/kernel/values': File "pix2pix.py", line 815, in
main()
File "pix2pix.py", line 709, in main
tf.summary.histogram(var.op.name + "/values", var)
File "/home/pro/miniconda3/lib/python3.7/site-packages/tensorflow_core/python/summary/summary.py", line 179, in histogram
tag=tag, values=values, name=scope)
File "/home/pro/miniconda3/lib/python3.7/site-packages/tensorflow_core/python/ops/gen_logging_ops.py", line 329, in histogram_summary
"HistogramSummary", tag=tag, values=values, name=name)
File "/home/pro/miniconda3/lib/python3.7/site-packages/tensorflow_core/python/framework/op_def_library.py", line 794, in _apply_op_helper
op_def=op_def)
File "/home/pro/miniconda3/lib/python3.7/site-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "/home/pro/miniconda3/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 3357, in create_op
attrs, op_def, compute_device)
File "/home/pro/miniconda3/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 3426, in _create_op_internal
op_def=op_def)
File "/home/pro/miniconda3/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 1748, in init
self._traceback = tf_stack.extract_stack()