ISCAS007 / torchseg

use pytorch to do image semantic segmentation
GNU General Public License v3.0
7 stars 1 forks source link

2018-08-19 tensorflow deeplab v3 plus + semantic edges #12

Open yzbx opened 6 years ago

yzbx commented 6 years ago

error

(new) ➜  deeplab git:(master) sh test/train.sh 
current path is /home/yzbx/git/deeplab
/home/yzbx/bin/miniconda3/envs/new/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
dataset cityscapes contain image 2975, label 2975
2018-08-19 22:49:48.210300: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2018-08-19 22:49:48.380085: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties: 
name: TITAN X (Pascal) major: 6 minor: 1 memoryClockRate(GHz): 1.531
pciBusID: 0000:0a:00.0
totalMemory: 11.90GiB freeMemory: 11.75GiB
2018-08-19 22:49:48.380117: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2018-08-19 22:49:48.698168: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-08-19 22:49:48.698205: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929]      0 
2018-08-19 22:49:48.698214: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0:   N 
2018-08-19 22:49:48.698483: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 11359 MB memory) -> physical GPU (device: 0, name: TITAN X (Pascal), pci bus id: 0000:0a:00.0, compute capability: 6.1)
image shape is (2, 1024, 2048, 3)
label shape is (2, 1024, 2048, 1)
image shape is (2, 769, 769, 3)
label shape is (2, 769, 769, 1)
semantic merged_logits (2, 193, 193, 19)
last layers is ['logits', 'image_pooling', 'aspp', 'concat_projection', 'decoder']
WARNING:tensorflow:Variable image_pooling/weights missing in checkpoint /home/yzbx/git/deeplab/deeplab/datasets/weights/xception/model.ckpt
WARNING:tensorflow:Variable image_pooling/BatchNorm/gamma missing in checkpoint /home/yzbx/git/deeplab/deeplab/datasets/weights/xception/model.ckpt
WARNING:tensorflow:Variable image_pooling/BatchNorm/beta missing in checkpoint /home/yzbx/git/deeplab/deeplab/datasets/weights/xception/model.ckpt
WARNING:tensorflow:Variable image_pooling/BatchNorm/moving_mean missing in checkpoint /home/yzbx/git/deeplab/deeplab/datasets/weights/xception/model.ckpt
WARNING:tensorflow:Variable image_pooling/BatchNorm/moving_variance missing in checkpoint /home/yzbx/git/deeplab/deeplab/datasets/weights/xception/model.ckpt
WARNING:tensorflow:Variable aspp0/weights missing in checkpoint /home/yzbx/git/deeplab/deeplab/datasets/weights/xception/model.ckpt
WARNING:tensorflow:Variable aspp0/BatchNorm/gamma missing in checkpoint /home/yzbx/git/deeplab/deeplab/datasets/weights/xception/model.ckpt
WARNING:tensorflow:Variable aspp0/BatchNorm/beta missing in checkpoint /home/yzbx/git/deeplab/deeplab/datasets/weights/xception/model.ckpt
WARNING:tensorflow:Variable aspp0/BatchNorm/moving_mean missing in checkpoint /home/yzbx/git/deeplab/deeplab/datasets/weights/xception/model.ckpt
WARNING:tensorflow:Variable aspp0/BatchNorm/moving_variance missing in checkpoint /home/yzbx/git/deeplab/deeplab/datasets/weights/xception/model.ckpt
WARNING:tensorflow:Variable aspp1_depthwise/depthwise_weights missing in checkpoint /home/yzbx/git/deeplab/deeplab/datasets/weights/xception/model.ckpt
WARNING:tensorflow:Variable aspp1_depthwise/BatchNorm/gamma missing in checkpoint /home/yzbx/git/deeplab/deeplab/datasets/weights/xception/model.ckpt
WARNING:tensorflow:Variable aspp1_depthwise/BatchNorm/beta missing in checkpoint /home/yzbx/git/deeplab/deeplab/datasets/weights/xception/model.ckpt
WARNING:tensorflow:Variable aspp1_depthwise/BatchNorm/moving_mean missing in checkpoint /home/yzbx/git/deeplab/deeplab/datasets/weights/xception/model.ckpt
WARNING:tensorflow:Variable aspp1_depthwise/BatchNorm/moving_variance missing in checkpoint /home/yzbx/git/deeplab/deeplab/datasets/weights/xception/model.ckpt
WARNING:tensorflow:Variable aspp1_pointwise/weights missing in checkpoint /home/yzbx/git/deeplab/deeplab/datasets/weights/xception/model.ckpt
WARNING:tensorflow:Variable aspp1_pointwise/BatchNorm/gamma missing in checkpoint /home/yzbx/git/deeplab/deeplab/datasets/weights/xception/model.ckpt
WARNING:tensorflow:Variable aspp1_pointwise/BatchNorm/beta missing in checkpoint /home/yzbx/git/deeplab/deeplab/datasets/weights/xception/model.ckpt
WARNING:tensorflow:Variable aspp1_pointwise/BatchNorm/moving_mean missing in checkpoint /home/yzbx/git/deeplab/deeplab/datasets/weights/xception/model.ckpt
WARNING:tensorflow:Variable aspp1_pointwise/BatchNorm/moving_variance missing in checkpoint /home/yzbx/git/deeplab/deeplab/datasets/weights/xception/model.ckpt
WARNING:tensorflow:Variable aspp2_depthwise/depthwise_weights missing in checkpoint /home/yzbx/git/deeplab/deeplab/datasets/weights/xception/model.ckpt
WARNING:tensorflow:Variable aspp2_depthwise/BatchNorm/gamma missing in checkpoint /home/yzbx/git/deeplab/deeplab/datasets/weights/xception/model.ckpt
WARNING:tensorflow:Variable aspp2_depthwise/BatchNorm/beta missing in checkpoint /home/yzbx/git/deeplab/deeplab/datasets/weights/xception/model.ckpt
WARNING:tensorflow:Variable aspp2_depthwise/BatchNorm/moving_mean missing in checkpoint /home/yzbx/git/deeplab/deeplab/datasets/weights/xception/model.ckpt
WARNING:tensorflow:Variable aspp2_depthwise/BatchNorm/moving_variance missing in checkpoint /home/yzbx/git/deeplab/deeplab/datasets/weights/xception/model.ckpt
WARNING:tensorflow:Variable aspp2_pointwise/weights missing in checkpoint /home/yzbx/git/deeplab/deeplab/datasets/weights/xception/model.ckpt
WARNING:tensorflow:Variable aspp2_pointwise/BatchNorm/gamma missing in checkpoint /home/yzbx/git/deeplab/deeplab/datasets/weights/xception/model.ckpt
WARNING:tensorflow:Variable aspp2_pointwise/BatchNorm/beta missing in checkpoint /home/yzbx/git/deeplab/deeplab/datasets/weights/xception/model.ckpt
WARNING:tensorflow:Variable aspp2_pointwise/BatchNorm/moving_mean missing in checkpoint /home/yzbx/git/deeplab/deeplab/datasets/weights/xception/model.ckpt
WARNING:tensorflow:Variable aspp2_pointwise/BatchNorm/moving_variance missing in checkpoint /home/yzbx/git/deeplab/deeplab/datasets/weights/xception/model.ckpt
WARNING:tensorflow:Variable aspp3_depthwise/depthwise_weights missing in checkpoint /home/yzbx/git/deeplab/deeplab/datasets/weights/xception/model.ckpt
WARNING:tensorflow:Variable aspp3_depthwise/BatchNorm/gamma missing in checkpoint /home/yzbx/git/deeplab/deeplab/datasets/weights/xception/model.ckpt
WARNING:tensorflow:Variable aspp3_depthwise/BatchNorm/beta missing in checkpoint /home/yzbx/git/deeplab/deeplab/datasets/weights/xception/model.ckpt
WARNING:tensorflow:Variable aspp3_depthwise/BatchNorm/moving_mean missing in checkpoint /home/yzbx/git/deeplab/deeplab/datasets/weights/xception/model.ckpt
WARNING:tensorflow:Variable aspp3_depthwise/BatchNorm/moving_variance missing in checkpoint /home/yzbx/git/deeplab/deeplab/datasets/weights/xception/model.ckpt
WARNING:tensorflow:Variable aspp3_pointwise/weights missing in checkpoint /home/yzbx/git/deeplab/deeplab/datasets/weights/xception/model.ckpt
WARNING:tensorflow:Variable aspp3_pointwise/BatchNorm/gamma missing in checkpoint /home/yzbx/git/deeplab/deeplab/datasets/weights/xception/model.ckpt
WARNING:tensorflow:Variable aspp3_pointwise/BatchNorm/beta missing in checkpoint /home/yzbx/git/deeplab/deeplab/datasets/weights/xception/model.ckpt
WARNING:tensorflow:Variable aspp3_pointwise/BatchNorm/moving_mean missing in checkpoint /home/yzbx/git/deeplab/deeplab/datasets/weights/xception/model.ckpt
WARNING:tensorflow:Variable aspp3_pointwise/BatchNorm/moving_variance missing in checkpoint /home/yzbx/git/deeplab/deeplab/datasets/weights/xception/model.ckpt
WARNING:tensorflow:Variable concat_projection/weights missing in checkpoint /home/yzbx/git/deeplab/deeplab/datasets/weights/xception/model.ckpt
WARNING:tensorflow:Variable concat_projection/BatchNorm/gamma missing in checkpoint /home/yzbx/git/deeplab/deeplab/datasets/weights/xception/model.ckpt
WARNING:tensorflow:Variable concat_projection/BatchNorm/beta missing in checkpoint /home/yzbx/git/deeplab/deeplab/datasets/weights/xception/model.ckpt
WARNING:tensorflow:Variable concat_projection/BatchNorm/moving_mean missing in checkpoint /home/yzbx/git/deeplab/deeplab/datasets/weights/xception/model.ckpt
WARNING:tensorflow:Variable concat_projection/BatchNorm/moving_variance missing in checkpoint /home/yzbx/git/deeplab/deeplab/datasets/weights/xception/model.ckpt
WARNING:tensorflow:Variable decoder/feature_projection0/weights missing in checkpoint /home/yzbx/git/deeplab/deeplab/datasets/weights/xception/model.ckpt
WARNING:tensorflow:Variable decoder/feature_projection0/BatchNorm/gamma missing in checkpoint /home/yzbx/git/deeplab/deeplab/datasets/weights/xception/model.ckpt
WARNING:tensorflow:Variable decoder/feature_projection0/BatchNorm/beta missing in checkpoint /home/yzbx/git/deeplab/deeplab/datasets/weights/xception/model.ckpt
WARNING:tensorflow:Variable decoder/feature_projection0/BatchNorm/moving_mean missing in checkpoint /home/yzbx/git/deeplab/deeplab/datasets/weights/xception/model.ckpt
WARNING:tensorflow:Variable decoder/feature_projection0/BatchNorm/moving_variance missing in checkpoint /home/yzbx/git/deeplab/deeplab/datasets/weights/xception/model.ckpt
WARNING:tensorflow:Variable decoder/decoder_conv0_depthwise/depthwise_weights missing in checkpoint /home/yzbx/git/deeplab/deeplab/datasets/weights/xception/model.ckpt
WARNING:tensorflow:Variable decoder/decoder_conv0_depthwise/BatchNorm/gamma missing in checkpoint /home/yzbx/git/deeplab/deeplab/datasets/weights/xception/model.ckpt
WARNING:tensorflow:Variable decoder/decoder_conv0_depthwise/BatchNorm/beta missing in checkpoint /home/yzbx/git/deeplab/deeplab/datasets/weights/xception/model.ckpt
WARNING:tensorflow:Variable decoder/decoder_conv0_depthwise/BatchNorm/moving_mean missing in checkpoint /home/yzbx/git/deeplab/deeplab/datasets/weights/xception/model.ckpt
WARNING:tensorflow:Variable decoder/decoder_conv0_depthwise/BatchNorm/moving_variance missing in checkpoint /home/yzbx/git/deeplab/deeplab/datasets/weights/xception/model.ckpt
WARNING:tensorflow:Variable decoder/decoder_conv0_pointwise/weights missing in checkpoint /home/yzbx/git/deeplab/deeplab/datasets/weights/xception/model.ckpt
WARNING:tensorflow:Variable decoder/decoder_conv0_pointwise/BatchNorm/gamma missing in checkpoint /home/yzbx/git/deeplab/deeplab/datasets/weights/xception/model.ckpt
WARNING:tensorflow:Variable decoder/decoder_conv0_pointwise/BatchNorm/beta missing in checkpoint /home/yzbx/git/deeplab/deeplab/datasets/weights/xception/model.ckpt
WARNING:tensorflow:Variable decoder/decoder_conv0_pointwise/BatchNorm/moving_mean missing in checkpoint /home/yzbx/git/deeplab/deeplab/datasets/weights/xception/model.ckpt
WARNING:tensorflow:Variable decoder/decoder_conv0_pointwise/BatchNorm/moving_variance missing in checkpoint /home/yzbx/git/deeplab/deeplab/datasets/weights/xception/model.ckpt
WARNING:tensorflow:Variable decoder/decoder_conv1_depthwise/depthwise_weights missing in checkpoint /home/yzbx/git/deeplab/deeplab/datasets/weights/xception/model.ckpt
WARNING:tensorflow:Variable decoder/decoder_conv1_depthwise/BatchNorm/gamma missing in checkpoint /home/yzbx/git/deeplab/deeplab/datasets/weights/xception/model.ckpt
WARNING:tensorflow:Variable decoder/decoder_conv1_depthwise/BatchNorm/beta missing in checkpoint /home/yzbx/git/deeplab/deeplab/datasets/weights/xception/model.ckpt
WARNING:tensorflow:Variable decoder/decoder_conv1_depthwise/BatchNorm/moving_mean missing in checkpoint /home/yzbx/git/deeplab/deeplab/datasets/weights/xception/model.ckpt
WARNING:tensorflow:Variable decoder/decoder_conv1_depthwise/BatchNorm/moving_variance missing in checkpoint /home/yzbx/git/deeplab/deeplab/datasets/weights/xception/model.ckpt
WARNING:tensorflow:Variable decoder/decoder_conv1_pointwise/weights missing in checkpoint /home/yzbx/git/deeplab/deeplab/datasets/weights/xception/model.ckpt
WARNING:tensorflow:Variable decoder/decoder_conv1_pointwise/BatchNorm/gamma missing in checkpoint /home/yzbx/git/deeplab/deeplab/datasets/weights/xception/model.ckpt
WARNING:tensorflow:Variable decoder/decoder_conv1_pointwise/BatchNorm/beta missing in checkpoint /home/yzbx/git/deeplab/deeplab/datasets/weights/xception/model.ckpt
WARNING:tensorflow:Variable decoder/decoder_conv1_pointwise/BatchNorm/moving_mean missing in checkpoint /home/yzbx/git/deeplab/deeplab/datasets/weights/xception/model.ckpt
WARNING:tensorflow:Variable decoder/decoder_conv1_pointwise/BatchNorm/moving_variance missing in checkpoint /home/yzbx/git/deeplab/deeplab/datasets/weights/xception/model.ckpt
WARNING:tensorflow:Variable logits/semantic/weights missing in checkpoint /home/yzbx/git/deeplab/deeplab/datasets/weights/xception/model.ckpt
WARNING:tensorflow:Variable logits/semantic/biases missing in checkpoint /home/yzbx/git/deeplab/deeplab/datasets/weights/xception/model.ckpt
epoches is 1
step is 1487
Exception ignored in: <bound method _DataLoaderIter.__del__ of <torch.utils.data.dataloader._DataLoaderIter object at 0x7f397d769d68>>
Traceback (most recent call last):
  File "/home/yzbx/bin/miniconda3/envs/new/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 349, in __del__
    self._shutdown_workers()
  File "/home/yzbx/bin/miniconda3/envs/new/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 328, in _shutdown_workers
    self.worker_result_queue.get()
  File "/home/yzbx/bin/miniconda3/envs/new/lib/python3.6/multiprocessing/queues.py", line 337, in get
    return _ForkingPickler.loads(res)
  File "/home/yzbx/bin/miniconda3/envs/new/lib/python3.6/site-packages/torch/multiprocessing/reductions.py", line 70, in rebuild_storage_fd
    fd = df.detach()
  File "/home/yzbx/bin/miniconda3/envs/new/lib/python3.6/multiprocessing/resource_sharer.py", line 57, in detach
    with _resource_sharer.get_connection(self._id) as conn:
  File "/home/yzbx/bin/miniconda3/envs/new/lib/python3.6/multiprocessing/resource_sharer.py", line 87, in get_connection
    c = Client(address, authkey=process.current_process().authkey)
  File "/home/yzbx/bin/miniconda3/envs/new/lib/python3.6/multiprocessing/connection.py", line 493, in Client
    answer_challenge(c, authkey)
  File "/home/yzbx/bin/miniconda3/envs/new/lib/python3.6/multiprocessing/connection.py", line 732, in answer_challenge
    message = connection.recv_bytes(256)         # reject large message
  File "/home/yzbx/bin/miniconda3/envs/new/lib/python3.6/multiprocessing/connection.py", line 216, in recv_bytes
    buf = self._recv_bytes(maxlength)
  File "/home/yzbx/bin/miniconda3/envs/new/lib/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes
    buf = self._recv(4)
  File "/home/yzbx/bin/miniconda3/envs/new/lib/python3.6/multiprocessing/connection.py", line 383, in _recv
    raise EOFError
EOFError: 
Traceback (most recent call last):
  File "/home/yzbx/bin/miniconda3/envs/new/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1322, in _do_call
    return fn(*args)
  File "/home/yzbx/bin/miniconda3/envs/new/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1307, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/home/yzbx/bin/miniconda3/envs/new/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1409, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.FailedPreconditionError: Attempting to use uninitialized value logits/semantic/biases/Momentum
     [[Node: Momentum/update_logits/semantic/biases/ApplyMomentum = ApplyMomentum[T=DT_FLOAT, use_locking=false, use_nesterov=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](logits/semantic/biases, logits/semantic/biases/Momentum, Select, gradients/logits/semantic/BiasAdd_grad/BiasAddGrad, PolynomialDecay/Cast_3/x)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "test/deeplab_test.py", line 180, in <module>
    tf.app.run()
  File "/home/yzbx/bin/miniconda3/envs/new/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 126, in run
    _sys.exit(main(argv))
  File "test/deeplab_test.py", line 174, in main
    net.train()
  File "/home/yzbx/git/deeplab/src/deeplab.py", line 210, in train
    i:d for i,d in zip(placeholders,np_values)})
  File "/home/yzbx/bin/miniconda3/envs/new/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 900, in run
    run_metadata_ptr)
  File "/home/yzbx/bin/miniconda3/envs/new/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1135, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home/yzbx/bin/miniconda3/envs/new/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1316, in _do_run
    run_metadata)
  File "/home/yzbx/bin/miniconda3/envs/new/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1335, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.FailedPreconditionError: Attempting to use uninitialized value logits/semantic/biases/Momentum
     [[Node: Momentum/update_logits/semantic/biases/ApplyMomentum = ApplyMomentum[T=DT_FLOAT, use_locking=false, use_nesterov=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](logits/semantic/biases, logits/semantic/biases/Momentum, Select, gradients/logits/semantic/BiasAdd_grad/BiasAddGrad, PolynomialDecay/Cast_3/x)]]

Caused by op 'Momentum/update_logits/semantic/biases/ApplyMomentum', defined at:
  File "test/deeplab_test.py", line 180, in <module>
    tf.app.run()
  File "/home/yzbx/bin/miniconda3/envs/new/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 126, in run
    _sys.exit(main(argv))
  File "test/deeplab_test.py", line 174, in main
    net.train()
  File "/home/yzbx/git/deeplab/src/deeplab.py", line 209, in train
    sess.run(fetches=[optimizer.minimize(total_loss), total_loss], feed_dict={
  File "/home/yzbx/bin/miniconda3/envs/new/lib/python3.6/site-packages/tensorflow/python/training/optimizer.py", line 424, in minimize
    name=name)
  File "/home/yzbx/bin/miniconda3/envs/new/lib/python3.6/site-packages/tensorflow/python/training/optimizer.py", line 617, in apply_gradients
    update_ops.append(processor.update_op(self, grad))
  File "/home/yzbx/bin/miniconda3/envs/new/lib/python3.6/site-packages/tensorflow/python/training/optimizer.py", line 113, in update_op
    update_op = optimizer._apply_dense(g, self._v)  # pylint: disable=protected-access
  File "/home/yzbx/bin/miniconda3/envs/new/lib/python3.6/site-packages/tensorflow/python/training/momentum.py", line 98, in _apply_dense
    use_nesterov=self._use_nesterov).op
  File "/home/yzbx/bin/miniconda3/envs/new/lib/python3.6/site-packages/tensorflow/python/training/gen_training_ops.py", line 571, in apply_momentum
    name=name)
  File "/home/yzbx/bin/miniconda3/envs/new/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/home/yzbx/bin/miniconda3/envs/new/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3392, in create_op
    op_def=op_def)
  File "/home/yzbx/bin/miniconda3/envs/new/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1718, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

FailedPreconditionError (see above for traceback): Attempting to use uninitialized value logits/semantic/biases/Momentum
     [[Node: Momentum/update_logits/semantic/biases/ApplyMomentum = ApplyMomentum[T=DT_FLOAT, use_locking=false, use_nesterov=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](logits/semantic/biases, logits/semantic/biases/Momentum, Select, gradients/logits/semantic/BiasAdd_grad/BiasAddGrad, PolynomialDecay/Cast_3/x)]]
yzbx commented 6 years ago

not due to init_fn

diff --git a/deeplab/train.py b/deeplab/train.py
index 1a2576c..15e3c8b 100644
--- a/deeplab/train.py
+++ b/deeplab/train.py
@@ -385,12 +385,7 @@ def main(unused_argv):
             is_chief=(FLAGS.task == 0),
             session_config=session_config,
             startup_delay_steps=startup_delay_steps,
-            init_fn=train_utils.get_model_init_fn(
-                FLAGS.train_logdir,
-                FLAGS.tf_initial_checkpoint,
-                FLAGS.initialize_last_layer,
-                last_layers,
-                ignore_missing_vars=True),
+            init_fn=None,
             summary_op=summary_op,
             save_summaries_secs=FLAGS.save_summaries_secs,
             save_interval_secs=FLAGS.save_interval_secs)

set init_fn = None, run model without error!!!

(new) ➜  deeplab git:(master) ✗ sh test/train.sh
current path is /home/yzbx/git/deeplab
/home/yzbx/bin/miniconda3/envs/new/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
INFO:tensorflow:Training on train set
first clone label name is: label:0
WARNING:tensorflow:From /home/yzbx/bin/miniconda3/envs/new/lib/python3.6/site-packages/tensorflow/contrib/slim/python/slim/learning.py:736: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.MonitoredTrainingSession
2018-08-21 15:13:17.575832: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2018-08-21 15:13:17.752743: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties: 
name: TITAN X (Pascal) major: 6 minor: 1 memoryClockRate(GHz): 1.531
pciBusID: 0000:0a:00.0
totalMemory: 11.90GiB freeMemory: 11.75GiB
2018-08-21 15:13:17.752789: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2018-08-21 15:13:17.993605: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-08-21 15:13:17.993641: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929]      0 
2018-08-21 15:13:17.993651: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0:   N 
2018-08-21 15:13:17.993918: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 11376 MB memory) -> physical GPU (device: 0, name: TITAN X (Pascal), pci bus id: 0000:0a:00.0, compute capability: 6.1)
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Starting Session.
INFO:tensorflow:Saving checkpoint to path /home/yzbx/tmp/logs/tensorflow/deeplab/cityscapes/xception_65/2018-08-21__15-13-01/model.ckpt
INFO:tensorflow:Starting Queues.
INFO:tensorflow:global_step/sec: 0
2018-08-21 15:13:35.167063: W tensorflow/core/common_runtime/bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.65GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2018-08-21 15:13:35.167138: W tensorflow/core/common_runtime/bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.65GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
INFO:tensorflow:Recording summary at step 0.
INFO:tensorflow:global step 10: loss = 7.7412 (0.978 sec/step)
INFO:tensorflow:Stopping Training.
INFO:tensorflow:Finished training! Saving model to disk.

dump the model variable

<tf.Variable 'logits/semantic/weights:0' shape=(1, 1, 256, 19) dtype=float32_ref>
<tf.Variable 'logits/semantic/biases:0' shape=(19,) dtype=float32_ref>
<tf.Variable 'logits/semantic/weights/Momentum:0' shape=(1, 1, 256, 19) dtype=float32_ref>
<tf.Variable 'logits/semantic/biases/Momentum:0' shape=(19,) dtype=float32_ref>

suppose

yzbx commented 6 years ago

variable name is the same before train for official code

git diff deeplab/train.py
-        # Start the training.
-        slim.learning.train(
-            train_tensor,
-            logdir=FLAGS.train_logdir,
-            log_every_n_steps=FLAGS.log_steps,
-            master=FLAGS.master,
-            number_of_steps=FLAGS.training_number_of_steps,
-            is_chief=(FLAGS.task == 0),
-            session_config=session_config,
-            startup_delay_steps=startup_delay_steps,
-            init_fn=train_utils.get_model_init_fn(
-                FLAGS.train_logdir,
-                FLAGS.tf_initial_checkpoint,
-                FLAGS.initialize_last_layer,
-                last_layers,
-                ignore_missing_vars=True),
-            summary_op=summary_op,
-            save_summaries_secs=FLAGS.save_summaries_secs,
-            save_interval_secs=FLAGS.save_interval_secs)
-
+        
+        for var in tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES):
+            if var.name.find('logits')>=0:
+                print(var)

<tf.Variable 'logits/semantic/weights:0' shape=(1, 1, 256, 19) dtype=float32_ref>
<tf.Variable 'logits/semantic/biases:0' shape=(19,) dtype=float32_ref>
<tf.Variable 'logits/semantic/weights/Momentum:0' shape=(1, 1, 256, 19) dtype=float32_ref>
<tf.Variable 'logits/semantic/biases/Momentum:0' shape=(19,) dtype=float32_ref>
yzbx commented 6 years ago

call model.optimizer.minimize too late

The problem is that you call model.optimizer.minimize too late. This methods creates additional tensors within your graph, so calling it within a loop is bad idea - it is something similar to a memory leak. Also, in case of stateful optimizers (such as AdamOptimizer) minimize creates additional variables. That's why you get exception you described - your initializer runs before you create them. The solution for you will be to place call to model.optimizer.minimize within the model class itself, and store its result in model`s attribute. So, your problem does not refer to this issue.

reference

yzbx commented 6 years ago

TODO