CharlesShang / FastMaskRCNN

Mask RCNN in TensorFlow
Apache License 2.0
3.09k stars 1.1k forks source link

InternalError from WhereOp with tf-1.4.1 #189

Open huwh1 opened 6 years ago

huwh1 commented 6 years ago

There seems to be a compatability problem with tensorflow-gpu 1.4.1. The train.py can be processed under tf-gpu 1.2.1 with some warnings. Nevertheless, there is always an error track back to

File "train/../libs/layers/wrapper.py", line 172, in assign_boxes
    inds = tf.where(tf.equal(assigned_layers, l))

under tf-gpu 1.4.1. But the problem disappears with repeatly infos "no CUDA-capable device is detected" if we set CUDA_VISIBLE_DEVICES="". We have centos 7.2.1511, nvidia k40c with driver 384.111 and cuda V8.0.61 with cudnn 5.1.10.

The full problematic log is attached here:

P2
P3
P4
P5
anchor_scales =  [8, 16, 32]
anchor_scales =  [4, 8, 16]
anchor_scales =  [2, 4, 8]
anchor_scales =  [1, 2, 4]
5
4
3
2
WARNING:tensorflow:From train/train.py:224: create_global_step (from tensorflow.contrib.framework.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.create_global_step
/home/huwh1/virtualenv/tf-1.4/lib/python2.7/site-packages/tensorflow/python/ops/gradients_impl.py:96: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
2018-03-02 10:56:24.945523: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2018-03-02 10:56:25.343999: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 0 with properties:
name: Tesla K40c major: 3 minor: 5 memoryClockRate(GHz): 0.745
pciBusID: 0000:06:00.0
totalMemory: 11.17GiB freeMemory: 11.09GiB
2018-03-02 10:56:25.344083: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: Tesla K40c, pci bus id: 0000:06:00.0, compute capability: 3.5)
--restore_previous_if_exists is set, but failed to restore in ./output/mask_rcnn/ None
restoring  resnet_v1_50/conv1/weights:0
restoring  resnet_v1_50/conv1/BatchNorm/gamma:0
restoring  resnet_v1_50/conv1/BatchNorm/beta:0
restoring  resnet_v1_50/conv1/BatchNorm/moving_mean:0
restoring  resnet_v1_50/conv1/BatchNorm/moving_variance:0
restoring  resnet_v1_50/block1/unit_1/bottleneck_v1/shortcut/weights:0
restoring  resnet_v1_50/block1/unit_1/bottleneck_v1/shortcut/BatchNorm/gamma:0
restoring  resnet_v1_50/block1/unit_1/bottleneck_v1/shortcut/BatchNorm/beta:0
restoring  resnet_v1_50/block1/unit_1/bottleneck_v1/shortcut/BatchNorm/moving_mean:0
restoring  resnet_v1_50/block1/unit_1/bottleneck_v1/shortcut/BatchNorm/moving_variance:0
restoring  resnet_v1_50/block1/unit_1/bottleneck_v1/conv1/weights:0
restoring  resnet_v1_50/block1/unit_1/bottleneck_v1/conv1/BatchNorm/gamma:0
restoring  resnet_v1_50/block1/unit_1/bottleneck_v1/conv1/BatchNorm/beta:0
restoring  resnet_v1_50/block1/unit_1/bottleneck_v1/conv1/BatchNorm/moving_mean:0
restoring  resnet_v1_50/block1/unit_1/bottleneck_v1/conv1/BatchNorm/moving_variance:0
restoring  resnet_v1_50/block1/unit_1/bottleneck_v1/conv2/weights:0
restoring  resnet_v1_50/block1/unit_1/bottleneck_v1/conv2/BatchNorm/gamma:0
restoring  resnet_v1_50/block1/unit_1/bottleneck_v1/conv2/BatchNorm/beta:0
restoring  resnet_v1_50/block1/unit_1/bottleneck_v1/conv2/BatchNorm/moving_mean:0
restoring  resnet_v1_50/block1/unit_1/bottleneck_v1/conv2/BatchNorm/moving_variance:0
restoring  resnet_v1_50/block1/unit_1/bottleneck_v1/conv3/weights:0
restoring  resnet_v1_50/block1/unit_1/bottleneck_v1/conv3/BatchNorm/gamma:0
restoring  resnet_v1_50/block1/unit_1/bottleneck_v1/conv3/BatchNorm/beta:0
restoring  resnet_v1_50/block1/unit_1/bottleneck_v1/conv3/BatchNorm/moving_mean:0
restoring  resnet_v1_50/block1/unit_1/bottleneck_v1/conv3/BatchNorm/moving_variance:0
restoring  resnet_v1_50/block1/unit_2/bottleneck_v1/conv1/weights:0
restoring  resnet_v1_50/block1/unit_2/bottleneck_v1/conv1/BatchNorm/gamma:0
restoring  resnet_v1_50/block1/unit_2/bottleneck_v1/conv1/BatchNorm/beta:0
restoring  resnet_v1_50/block1/unit_2/bottleneck_v1/conv1/BatchNorm/moving_mean:0
restoring  resnet_v1_50/block1/unit_2/bottleneck_v1/conv1/BatchNorm/moving_variance:0
restoring  resnet_v1_50/block1/unit_2/bottleneck_v1/conv2/weights:0
restoring  resnet_v1_50/block1/unit_2/bottleneck_v1/conv2/BatchNorm/gamma:0
restoring  resnet_v1_50/block1/unit_2/bottleneck_v1/conv2/BatchNorm/beta:0
restoring  resnet_v1_50/block1/unit_2/bottleneck_v1/conv2/BatchNorm/moving_mean:0
restoring  resnet_v1_50/block1/unit_2/bottleneck_v1/conv2/BatchNorm/moving_variance:0
restoring  resnet_v1_50/block1/unit_2/bottleneck_v1/conv3/weights:0
restoring  resnet_v1_50/block1/unit_2/bottleneck_v1/conv3/BatchNorm/gamma:0
restoring  resnet_v1_50/block1/unit_2/bottleneck_v1/conv3/BatchNorm/beta:0
restoring  resnet_v1_50/block1/unit_2/bottleneck_v1/conv3/BatchNorm/moving_mean:0
restoring  resnet_v1_50/block1/unit_2/bottleneck_v1/conv3/BatchNorm/moving_variance:0
restoring  resnet_v1_50/block1/unit_3/bottleneck_v1/conv1/weights:0
restoring  resnet_v1_50/block1/unit_3/bottleneck_v1/conv1/BatchNorm/gamma:0
restoring  resnet_v1_50/block1/unit_3/bottleneck_v1/conv1/BatchNorm/beta:0
restoring  resnet_v1_50/block1/unit_3/bottleneck_v1/conv1/BatchNorm/moving_mean:0
restoring  resnet_v1_50/block1/unit_3/bottleneck_v1/conv1/BatchNorm/moving_variance:0
restoring  resnet_v1_50/block1/unit_3/bottleneck_v1/conv2/weights:0
restoring  resnet_v1_50/block1/unit_3/bottleneck_v1/conv2/BatchNorm/gamma:0
restoring  resnet_v1_50/block1/unit_3/bottleneck_v1/conv2/BatchNorm/beta:0
restoring  resnet_v1_50/block1/unit_3/bottleneck_v1/conv2/BatchNorm/moving_mean:0
restoring  resnet_v1_50/block1/unit_3/bottleneck_v1/conv2/BatchNorm/moving_variance:0
restoring  resnet_v1_50/block1/unit_3/bottleneck_v1/conv3/weights:0
restoring  resnet_v1_50/block1/unit_3/bottleneck_v1/conv3/BatchNorm/gamma:0
restoring  resnet_v1_50/block1/unit_3/bottleneck_v1/conv3/BatchNorm/beta:0
restoring  resnet_v1_50/block1/unit_3/bottleneck_v1/conv3/BatchNorm/moving_mean:0
restoring  resnet_v1_50/block1/unit_3/bottleneck_v1/conv3/BatchNorm/moving_variance:0
restoring  resnet_v1_50/block2/unit_1/bottleneck_v1/shortcut/weights:0
restoring  resnet_v1_50/block2/unit_1/bottleneck_v1/shortcut/BatchNorm/gamma:0
restoring  resnet_v1_50/block2/unit_1/bottleneck_v1/shortcut/BatchNorm/beta:0
restoring  resnet_v1_50/block2/unit_1/bottleneck_v1/shortcut/BatchNorm/moving_mean:0
restoring  resnet_v1_50/block2/unit_1/bottleneck_v1/shortcut/BatchNorm/moving_variance:0
restoring  resnet_v1_50/block2/unit_1/bottleneck_v1/conv1/weights:0
restoring  resnet_v1_50/block2/unit_1/bottleneck_v1/conv1/BatchNorm/gamma:0
restoring  resnet_v1_50/block2/unit_1/bottleneck_v1/conv1/BatchNorm/beta:0
restoring  resnet_v1_50/block2/unit_1/bottleneck_v1/conv1/BatchNorm/moving_mean:0
restoring  resnet_v1_50/block2/unit_1/bottleneck_v1/conv1/BatchNorm/moving_variance:0
restoring  resnet_v1_50/block2/unit_1/bottleneck_v1/conv2/weights:0
restoring  resnet_v1_50/block2/unit_1/bottleneck_v1/conv2/BatchNorm/gamma:0
restoring  resnet_v1_50/block2/unit_1/bottleneck_v1/conv2/BatchNorm/beta:0
restoring  resnet_v1_50/block2/unit_1/bottleneck_v1/conv2/BatchNorm/moving_mean:0
restoring  resnet_v1_50/block2/unit_1/bottleneck_v1/conv2/BatchNorm/moving_variance:0
restoring  resnet_v1_50/block2/unit_1/bottleneck_v1/conv3/weights:0
restoring  resnet_v1_50/block2/unit_1/bottleneck_v1/conv3/BatchNorm/gamma:0
restoring  resnet_v1_50/block2/unit_1/bottleneck_v1/conv3/BatchNorm/beta:0
restoring  resnet_v1_50/block2/unit_1/bottleneck_v1/conv3/BatchNorm/moving_mean:0
restoring  resnet_v1_50/block2/unit_1/bottleneck_v1/conv3/BatchNorm/moving_variance:0
restoring  resnet_v1_50/block2/unit_2/bottleneck_v1/conv1/weights:0
restoring  resnet_v1_50/block2/unit_2/bottleneck_v1/conv1/BatchNorm/gamma:0
restoring  resnet_v1_50/block2/unit_2/bottleneck_v1/conv1/BatchNorm/beta:0
restoring  resnet_v1_50/block2/unit_2/bottleneck_v1/conv1/BatchNorm/moving_mean:0
restoring  resnet_v1_50/block2/unit_2/bottleneck_v1/conv1/BatchNorm/moving_variance:0
restoring  resnet_v1_50/block2/unit_2/bottleneck_v1/conv2/weights:0
restoring  resnet_v1_50/block2/unit_2/bottleneck_v1/conv2/BatchNorm/gamma:0
restoring  resnet_v1_50/block2/unit_2/bottleneck_v1/conv2/BatchNorm/beta:0
restoring  resnet_v1_50/block2/unit_2/bottleneck_v1/conv2/BatchNorm/moving_mean:0
restoring  resnet_v1_50/block2/unit_2/bottleneck_v1/conv2/BatchNorm/moving_variance:0
restoring  resnet_v1_50/block2/unit_2/bottleneck_v1/conv3/weights:0
restoring  resnet_v1_50/block2/unit_2/bottleneck_v1/conv3/BatchNorm/gamma:0
restoring  resnet_v1_50/block2/unit_2/bottleneck_v1/conv3/BatchNorm/beta:0
restoring  resnet_v1_50/block2/unit_2/bottleneck_v1/conv3/BatchNorm/moving_mean:0
restoring  resnet_v1_50/block2/unit_2/bottleneck_v1/conv3/BatchNorm/moving_variance:0
restoring  resnet_v1_50/block2/unit_3/bottleneck_v1/conv1/weights:0
restoring  resnet_v1_50/block2/unit_3/bottleneck_v1/conv1/BatchNorm/gamma:0
restoring  resnet_v1_50/block2/unit_3/bottleneck_v1/conv1/BatchNorm/beta:0
restoring  resnet_v1_50/block2/unit_3/bottleneck_v1/conv1/BatchNorm/moving_mean:0
restoring  resnet_v1_50/block2/unit_3/bottleneck_v1/conv1/BatchNorm/moving_variance:0
restoring  resnet_v1_50/block2/unit_3/bottleneck_v1/conv2/weights:0
restoring  resnet_v1_50/block2/unit_3/bottleneck_v1/conv2/BatchNorm/gamma:0
restoring  resnet_v1_50/block2/unit_3/bottleneck_v1/conv2/BatchNorm/beta:0
restoring  resnet_v1_50/block2/unit_3/bottleneck_v1/conv2/BatchNorm/moving_mean:0
restoring  resnet_v1_50/block2/unit_3/bottleneck_v1/conv2/BatchNorm/moving_variance:0
restoring  resnet_v1_50/block2/unit_3/bottleneck_v1/conv3/weights:0
restoring  resnet_v1_50/block2/unit_3/bottleneck_v1/conv3/BatchNorm/gamma:0
restoring  resnet_v1_50/block2/unit_3/bottleneck_v1/conv3/BatchNorm/beta:0
restoring  resnet_v1_50/block2/unit_3/bottleneck_v1/conv3/BatchNorm/moving_mean:0
restoring  resnet_v1_50/block2/unit_3/bottleneck_v1/conv3/BatchNorm/moving_variance:0
restoring  resnet_v1_50/block2/unit_4/bottleneck_v1/conv1/weights:0
restoring  resnet_v1_50/block2/unit_4/bottleneck_v1/conv1/BatchNorm/gamma:0
restoring  resnet_v1_50/block2/unit_4/bottleneck_v1/conv1/BatchNorm/beta:0
restoring  resnet_v1_50/block2/unit_4/bottleneck_v1/conv1/BatchNorm/moving_mean:0
restoring  resnet_v1_50/block2/unit_4/bottleneck_v1/conv1/BatchNorm/moving_variance:0
restoring  resnet_v1_50/block2/unit_4/bottleneck_v1/conv2/weights:0
restoring  resnet_v1_50/block2/unit_4/bottleneck_v1/conv2/BatchNorm/gamma:0
restoring  resnet_v1_50/block2/unit_4/bottleneck_v1/conv2/BatchNorm/beta:0
restoring  resnet_v1_50/block2/unit_4/bottleneck_v1/conv2/BatchNorm/moving_mean:0
restoring  resnet_v1_50/block2/unit_4/bottleneck_v1/conv2/BatchNorm/moving_variance:0
restoring  resnet_v1_50/block2/unit_4/bottleneck_v1/conv3/weights:0
restoring  resnet_v1_50/block2/unit_4/bottleneck_v1/conv3/BatchNorm/gamma:0
restoring  resnet_v1_50/block2/unit_4/bottleneck_v1/conv3/BatchNorm/beta:0
restoring  resnet_v1_50/block2/unit_4/bottleneck_v1/conv3/BatchNorm/moving_mean:0
restoring  resnet_v1_50/block2/unit_4/bottleneck_v1/conv3/BatchNorm/moving_variance:0
restoring  resnet_v1_50/block3/unit_1/bottleneck_v1/shortcut/weights:0
restoring  resnet_v1_50/block3/unit_1/bottleneck_v1/shortcut/BatchNorm/gamma:0
restoring  resnet_v1_50/block3/unit_1/bottleneck_v1/shortcut/BatchNorm/beta:0
restoring  resnet_v1_50/block3/unit_1/bottleneck_v1/shortcut/BatchNorm/moving_mean:0
restoring  resnet_v1_50/block3/unit_1/bottleneck_v1/shortcut/BatchNorm/moving_variance:0
restoring  resnet_v1_50/block3/unit_1/bottleneck_v1/conv1/weights:0
restoring  resnet_v1_50/block3/unit_1/bottleneck_v1/conv1/BatchNorm/gamma:0
restoring  resnet_v1_50/block3/unit_1/bottleneck_v1/conv1/BatchNorm/beta:0
restoring  resnet_v1_50/block3/unit_1/bottleneck_v1/conv1/BatchNorm/moving_mean:0
restoring  resnet_v1_50/block3/unit_1/bottleneck_v1/conv1/BatchNorm/moving_variance:0
restoring  resnet_v1_50/block3/unit_1/bottleneck_v1/conv2/weights:0
restoring  resnet_v1_50/block3/unit_1/bottleneck_v1/conv2/BatchNorm/gamma:0
restoring  resnet_v1_50/block3/unit_1/bottleneck_v1/conv2/BatchNorm/beta:0
restoring  resnet_v1_50/block3/unit_1/bottleneck_v1/conv2/BatchNorm/moving_mean:0
restoring  resnet_v1_50/block3/unit_1/bottleneck_v1/conv2/BatchNorm/moving_variance:0
restoring  resnet_v1_50/block3/unit_1/bottleneck_v1/conv3/weights:0
restoring  resnet_v1_50/block3/unit_1/bottleneck_v1/conv3/BatchNorm/gamma:0
restoring  resnet_v1_50/block3/unit_1/bottleneck_v1/conv3/BatchNorm/beta:0
restoring  resnet_v1_50/block3/unit_1/bottleneck_v1/conv3/BatchNorm/moving_mean:0
restoring  resnet_v1_50/block3/unit_1/bottleneck_v1/conv3/BatchNorm/moving_variance:0
restoring  resnet_v1_50/block3/unit_2/bottleneck_v1/conv1/weights:0
restoring  resnet_v1_50/block3/unit_2/bottleneck_v1/conv1/BatchNorm/gamma:0
restoring  resnet_v1_50/block3/unit_2/bottleneck_v1/conv1/BatchNorm/beta:0
restoring  resnet_v1_50/block3/unit_2/bottleneck_v1/conv1/BatchNorm/moving_mean:0
restoring  resnet_v1_50/block3/unit_2/bottleneck_v1/conv1/BatchNorm/moving_variance:0
restoring  resnet_v1_50/block3/unit_2/bottleneck_v1/conv2/weights:0
restoring  resnet_v1_50/block3/unit_2/bottleneck_v1/conv2/BatchNorm/gamma:0
restoring  resnet_v1_50/block3/unit_2/bottleneck_v1/conv2/BatchNorm/beta:0
restoring  resnet_v1_50/block3/unit_2/bottleneck_v1/conv2/BatchNorm/moving_mean:0
restoring  resnet_v1_50/block3/unit_2/bottleneck_v1/conv2/BatchNorm/moving_variance:0
restoring  resnet_v1_50/block3/unit_2/bottleneck_v1/conv3/weights:0
restoring  resnet_v1_50/block3/unit_2/bottleneck_v1/conv3/BatchNorm/gamma:0
restoring  resnet_v1_50/block3/unit_2/bottleneck_v1/conv3/BatchNorm/beta:0
restoring  resnet_v1_50/block3/unit_2/bottleneck_v1/conv3/BatchNorm/moving_mean:0
restoring  resnet_v1_50/block3/unit_2/bottleneck_v1/conv3/BatchNorm/moving_variance:0
restoring  resnet_v1_50/block3/unit_3/bottleneck_v1/conv1/weights:0
restoring  resnet_v1_50/block3/unit_3/bottleneck_v1/conv1/BatchNorm/gamma:0
restoring  resnet_v1_50/block3/unit_3/bottleneck_v1/conv1/BatchNorm/beta:0
restoring  resnet_v1_50/block3/unit_3/bottleneck_v1/conv1/BatchNorm/moving_mean:0
restoring  resnet_v1_50/block3/unit_3/bottleneck_v1/conv1/BatchNorm/moving_variance:0
restoring  resnet_v1_50/block3/unit_3/bottleneck_v1/conv2/weights:0
restoring  resnet_v1_50/block3/unit_3/bottleneck_v1/conv2/BatchNorm/gamma:0
restoring  resnet_v1_50/block3/unit_3/bottleneck_v1/conv2/BatchNorm/beta:0
restoring  resnet_v1_50/block3/unit_3/bottleneck_v1/conv2/BatchNorm/moving_mean:0
restoring  resnet_v1_50/block3/unit_3/bottleneck_v1/conv2/BatchNorm/moving_variance:0
restoring  resnet_v1_50/block3/unit_3/bottleneck_v1/conv3/weights:0
restoring  resnet_v1_50/block3/unit_3/bottleneck_v1/conv3/BatchNorm/gamma:0
restoring  resnet_v1_50/block3/unit_3/bottleneck_v1/conv3/BatchNorm/beta:0
restoring  resnet_v1_50/block3/unit_3/bottleneck_v1/conv3/BatchNorm/moving_mean:0
restoring  resnet_v1_50/block3/unit_3/bottleneck_v1/conv3/BatchNorm/moving_variance:0
restoring  resnet_v1_50/block3/unit_4/bottleneck_v1/conv1/weights:0
restoring  resnet_v1_50/block3/unit_4/bottleneck_v1/conv1/BatchNorm/gamma:0
restoring  resnet_v1_50/block3/unit_4/bottleneck_v1/conv1/BatchNorm/beta:0
restoring  resnet_v1_50/block3/unit_4/bottleneck_v1/conv1/BatchNorm/moving_mean:0
restoring  resnet_v1_50/block3/unit_4/bottleneck_v1/conv1/BatchNorm/moving_variance:0
restoring  resnet_v1_50/block3/unit_4/bottleneck_v1/conv2/weights:0
restoring  resnet_v1_50/block3/unit_4/bottleneck_v1/conv2/BatchNorm/gamma:0
restoring  resnet_v1_50/block3/unit_4/bottleneck_v1/conv2/BatchNorm/beta:0
restoring  resnet_v1_50/block3/unit_4/bottleneck_v1/conv2/BatchNorm/moving_mean:0
restoring  resnet_v1_50/block3/unit_4/bottleneck_v1/conv2/BatchNorm/moving_variance:0
restoring  resnet_v1_50/block3/unit_4/bottleneck_v1/conv3/weights:0
restoring  resnet_v1_50/block3/unit_4/bottleneck_v1/conv3/BatchNorm/gamma:0
restoring  resnet_v1_50/block3/unit_4/bottleneck_v1/conv3/BatchNorm/beta:0
restoring  resnet_v1_50/block3/unit_4/bottleneck_v1/conv3/BatchNorm/moving_mean:0
restoring  resnet_v1_50/block3/unit_4/bottleneck_v1/conv3/BatchNorm/moving_variance:0
restoring  resnet_v1_50/block3/unit_5/bottleneck_v1/conv1/weights:0
restoring  resnet_v1_50/block3/unit_5/bottleneck_v1/conv1/BatchNorm/gamma:0
restoring  resnet_v1_50/block3/unit_5/bottleneck_v1/conv1/BatchNorm/beta:0
restoring  resnet_v1_50/block3/unit_5/bottleneck_v1/conv1/BatchNorm/moving_mean:0
restoring  resnet_v1_50/block3/unit_5/bottleneck_v1/conv1/BatchNorm/moving_variance:0
restoring  resnet_v1_50/block3/unit_5/bottleneck_v1/conv2/weights:0
restoring  resnet_v1_50/block3/unit_5/bottleneck_v1/conv2/BatchNorm/gamma:0
restoring  resnet_v1_50/block3/unit_5/bottleneck_v1/conv2/BatchNorm/beta:0
restoring  resnet_v1_50/block3/unit_5/bottleneck_v1/conv2/BatchNorm/moving_mean:0
restoring  resnet_v1_50/block3/unit_5/bottleneck_v1/conv2/BatchNorm/moving_variance:0
restoring  resnet_v1_50/block3/unit_5/bottleneck_v1/conv3/weights:0
restoring  resnet_v1_50/block3/unit_5/bottleneck_v1/conv3/BatchNorm/gamma:0
restoring  resnet_v1_50/block3/unit_5/bottleneck_v1/conv3/BatchNorm/beta:0
restoring  resnet_v1_50/block3/unit_5/bottleneck_v1/conv3/BatchNorm/moving_mean:0
restoring  resnet_v1_50/block3/unit_5/bottleneck_v1/conv3/BatchNorm/moving_variance:0
restoring  resnet_v1_50/block3/unit_6/bottleneck_v1/conv1/weights:0
restoring  resnet_v1_50/block3/unit_6/bottleneck_v1/conv1/BatchNorm/gamma:0
restoring  resnet_v1_50/block3/unit_6/bottleneck_v1/conv1/BatchNorm/beta:0
restoring  resnet_v1_50/block3/unit_6/bottleneck_v1/conv1/BatchNorm/moving_mean:0
restoring  resnet_v1_50/block3/unit_6/bottleneck_v1/conv1/BatchNorm/moving_variance:0
restoring  resnet_v1_50/block3/unit_6/bottleneck_v1/conv2/weights:0
restoring  resnet_v1_50/block3/unit_6/bottleneck_v1/conv2/BatchNorm/gamma:0
restoring  resnet_v1_50/block3/unit_6/bottleneck_v1/conv2/BatchNorm/beta:0
restoring  resnet_v1_50/block3/unit_6/bottleneck_v1/conv2/BatchNorm/moving_mean:0
restoring  resnet_v1_50/block3/unit_6/bottleneck_v1/conv2/BatchNorm/moving_variance:0
restoring  resnet_v1_50/block3/unit_6/bottleneck_v1/conv3/weights:0
restoring  resnet_v1_50/block3/unit_6/bottleneck_v1/conv3/BatchNorm/gamma:0
restoring  resnet_v1_50/block3/unit_6/bottleneck_v1/conv3/BatchNorm/beta:0
restoring  resnet_v1_50/block3/unit_6/bottleneck_v1/conv3/BatchNorm/moving_mean:0
restoring  resnet_v1_50/block3/unit_6/bottleneck_v1/conv3/BatchNorm/moving_variance:0
restoring  resnet_v1_50/block4/unit_1/bottleneck_v1/shortcut/weights:0
restoring  resnet_v1_50/block4/unit_1/bottleneck_v1/shortcut/BatchNorm/gamma:0
restoring  resnet_v1_50/block4/unit_1/bottleneck_v1/shortcut/BatchNorm/beta:0
restoring  resnet_v1_50/block4/unit_1/bottleneck_v1/shortcut/BatchNorm/moving_mean:0
restoring  resnet_v1_50/block4/unit_1/bottleneck_v1/shortcut/BatchNorm/moving_variance:0
restoring  resnet_v1_50/block4/unit_1/bottleneck_v1/conv1/weights:0
restoring  resnet_v1_50/block4/unit_1/bottleneck_v1/conv1/BatchNorm/gamma:0
restoring  resnet_v1_50/block4/unit_1/bottleneck_v1/conv1/BatchNorm/beta:0
restoring  resnet_v1_50/block4/unit_1/bottleneck_v1/conv1/BatchNorm/moving_mean:0
restoring  resnet_v1_50/block4/unit_1/bottleneck_v1/conv1/BatchNorm/moving_variance:0
restoring  resnet_v1_50/block4/unit_1/bottleneck_v1/conv2/weights:0
restoring  resnet_v1_50/block4/unit_1/bottleneck_v1/conv2/BatchNorm/gamma:0
restoring  resnet_v1_50/block4/unit_1/bottleneck_v1/conv2/BatchNorm/beta:0
restoring  resnet_v1_50/block4/unit_1/bottleneck_v1/conv2/BatchNorm/moving_mean:0
restoring  resnet_v1_50/block4/unit_1/bottleneck_v1/conv2/BatchNorm/moving_variance:0
restoring  resnet_v1_50/block4/unit_1/bottleneck_v1/conv3/weights:0
restoring  resnet_v1_50/block4/unit_1/bottleneck_v1/conv3/BatchNorm/gamma:0
restoring  resnet_v1_50/block4/unit_1/bottleneck_v1/conv3/BatchNorm/beta:0
restoring  resnet_v1_50/block4/unit_1/bottleneck_v1/conv3/BatchNorm/moving_mean:0
restoring  resnet_v1_50/block4/unit_1/bottleneck_v1/conv3/BatchNorm/moving_variance:0
restoring  resnet_v1_50/block4/unit_2/bottleneck_v1/conv1/weights:0
restoring  resnet_v1_50/block4/unit_2/bottleneck_v1/conv1/BatchNorm/gamma:0
restoring  resnet_v1_50/block4/unit_2/bottleneck_v1/conv1/BatchNorm/beta:0
restoring  resnet_v1_50/block4/unit_2/bottleneck_v1/conv1/BatchNorm/moving_mean:0
restoring  resnet_v1_50/block4/unit_2/bottleneck_v1/conv1/BatchNorm/moving_variance:0
restoring  resnet_v1_50/block4/unit_2/bottleneck_v1/conv2/weights:0
restoring  resnet_v1_50/block4/unit_2/bottleneck_v1/conv2/BatchNorm/gamma:0
restoring  resnet_v1_50/block4/unit_2/bottleneck_v1/conv2/BatchNorm/beta:0
restoring  resnet_v1_50/block4/unit_2/bottleneck_v1/conv2/BatchNorm/moving_mean:0
restoring  resnet_v1_50/block4/unit_2/bottleneck_v1/conv2/BatchNorm/moving_variance:0
restoring  resnet_v1_50/block4/unit_2/bottleneck_v1/conv3/weights:0
restoring  resnet_v1_50/block4/unit_2/bottleneck_v1/conv3/BatchNorm/gamma:0
restoring  resnet_v1_50/block4/unit_2/bottleneck_v1/conv3/BatchNorm/beta:0
restoring  resnet_v1_50/block4/unit_2/bottleneck_v1/conv3/BatchNorm/moving_mean:0
restoring  resnet_v1_50/block4/unit_2/bottleneck_v1/conv3/BatchNorm/moving_variance:0
restoring  resnet_v1_50/block4/unit_3/bottleneck_v1/conv1/weights:0
restoring  resnet_v1_50/block4/unit_3/bottleneck_v1/conv1/BatchNorm/gamma:0
restoring  resnet_v1_50/block4/unit_3/bottleneck_v1/conv1/BatchNorm/beta:0
restoring  resnet_v1_50/block4/unit_3/bottleneck_v1/conv1/BatchNorm/moving_mean:0
restoring  resnet_v1_50/block4/unit_3/bottleneck_v1/conv1/BatchNorm/moving_variance:0
restoring  resnet_v1_50/block4/unit_3/bottleneck_v1/conv2/weights:0
restoring  resnet_v1_50/block4/unit_3/bottleneck_v1/conv2/BatchNorm/gamma:0
restoring  resnet_v1_50/block4/unit_3/bottleneck_v1/conv2/BatchNorm/beta:0
restoring  resnet_v1_50/block4/unit_3/bottleneck_v1/conv2/BatchNorm/moving_mean:0
restoring  resnet_v1_50/block4/unit_3/bottleneck_v1/conv2/BatchNorm/moving_variance:0
restoring  resnet_v1_50/block4/unit_3/bottleneck_v1/conv3/weights:0
restoring  resnet_v1_50/block4/unit_3/bottleneck_v1/conv3/BatchNorm/gamma:0
restoring  resnet_v1_50/block4/unit_3/bottleneck_v1/conv3/BatchNorm/beta:0
restoring  resnet_v1_50/block4/unit_3/bottleneck_v1/conv3/BatchNorm/moving_mean:0
restoring  resnet_v1_50/block4/unit_3/bottleneck_v1/conv3/BatchNorm/moving_variance:0
restoring  resnet_v1_50/logits/weights:0
restoring  resnet_v1_50/logits/biases:0
Restored 267(640) vars from ./data/pretrained_models/resnet_v1_50.ckpt
2018-03-02 10:56:39.155490: W tensorflow/core/framework/op_kernel.cc:1192] Internal: WhereOp: Could not launch cub::DeviceReduce::Sum to count number of true indices.  temp_storage_bytes: 1, status: invalid device function
2018-03-02 10:56:39.155816: W tensorflow/core/framework/op_kernel.cc:1192] Internal: WhereOp: Could not launch cub::DeviceReduce::Sum to count number of true indices.  temp_storage_bytes: 1, status: invalid device function
         [[Node: pyramid_1/AssignGTBoxes/Where_4 = Where[_device="/job:localhost/replica:0/task:0/device:GPU:0"](pyramid_1/AssignGTBoxes/Equal_4/_1027)]]
2018-03-02 10:56:39.156089: W tensorflow/core/framework/op_kernel.cc:1192] Internal: WhereOp: Could not launch cub::DeviceReduce::Sum to count number of true indices.  temp_storage_bytes: 1, status: invalid device function
         [[Node: pyramid_1/AssignGTBoxes/Where_4 = Where[_device="/job:localhost/replica:0/task:0/device:GPU:0"](pyramid_1/AssignGTBoxes/Equal_4/_1027)]]
2018-03-02 10:56:39.156227: W tensorflow/core/framework/op_kernel.cc:1192] Internal: WhereOp: Could not launch cub::DeviceReduce::Sum to count number of true indices.  temp_storage_bytes: 1, status: invalid device function
         [[Node: pyramid_1/AssignGTBoxes/Where_4 = Where[_device="/job:localhost/replica:0/task:0/device:GPU:0"](pyramid_1/AssignGTBoxes/Equal_4/_1027)]]
Traceback (most recent call last):
  File "train/train.py", line 339, in <module>
    train()
  File "train/train.py", line 271, in train
    [input_image] + [final_box] + [final_cls] + [final_prob] + [final_gt_cls] + [gt] + [tmp_0] + [tmp_1] + [tmp_2] + [tmp_3] + [tmp_4])
  File "/home/huwh1/virtualenv/tf-1.4/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 889, in run
    run_metadata_ptr)
  File "/home/huwh1/virtualenv/tf-1.4/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1120, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home/huwh1/virtualenv/tf-1.4/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1317, in _do_run
    options, run_metadata)
  File "/home/huwh1/virtualenv/tf-1.4/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1336, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: WhereOp: Could not launch cub::DeviceReduce::Sum to count number of true indices.  temp_storage_bytes: 1, status: invalid device function
         [[Node: pyramid_1/AssignGTBoxes/Where_4 = Where[_device="/job:localhost/replica:0/task:0/device:GPU:0"](pyramid_1/AssignGTBoxes/Equal_4/_1027)]]
         [[Node: pyramid_2/OneHotEncoding_4/one_hot/_1183 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_9981_pyramid_2/OneHotEncoding_4/one_hot", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Caused by op u'pyramid_1/AssignGTBoxes/Where_4', defined at:
  File "train/train.py", line 339, in <module>
    train()
  File "train/train.py", line 193, in train
    loss_weights=[0.2, 0.2, 1.0, 0.2, 1.0])
  File "train/../libs/nets/pyramid_network.py", line 580, in build
    is_training=is_training, gt_boxes=gt_boxes)
  File "train/../libs/nets/pyramid_network.py", line 263, in build_heads
    assign_boxes(rois, [rois, batch_inds], [2, 3, 4, 5])
  File "train/../libs/layers/wrapper.py", line 172, in assign_boxes
    inds = tf.where(tf.equal(assigned_layers, l))
  File "/home/huwh1/virtualenv/tf-1.4/lib/python2.7/site-packages/tensorflow/python/ops/array_ops.py", line 2439, in where
    return gen_array_ops.where(input=condition, name=name)
  File "/home/huwh1/virtualenv/tf-1.4/lib/python2.7/site-packages/tensorflow/python/ops/gen_array_ops.py", line 5930, in where
    "Where", input=input, name=name)
  File "/home/huwh1/virtualenv/tf-1.4/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/home/huwh1/virtualenv/tf-1.4/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2956, in create_op
    op_def=op_def)
  File "/home/huwh1/virtualenv/tf-1.4/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1470, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

InternalError (see above for traceback): WhereOp: Could not launch cub::DeviceReduce::Sum to count number of true indices.  temp_storage_bytes: 1, status: invalid device function
         [[Node: pyramid_1/AssignGTBoxes/Where_4 = Where[_device="/job:localhost/replica:0/task:0/device:GPU:0"](pyramid_1/AssignGTBoxes/Equal_4/_1027)]]
         [[Node: pyramid_2/OneHotEncoding_4/one_hot/_1183 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_9981_pyramid_2/OneHotEncoding_4/one_hot", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
zhuoli1987 commented 6 years ago

I have the same problem with tf 1.3.0.