Closed mgarbade closed 7 years ago
You can modify the image either directly (when the ImageReader methods are defined) -, or in the main script (modifying a batch of images), e.g. after this line: In both cases you can just multiply the image (or a batch of images) by some mask.
Thanks a lot for your reply! I guess it is working. One more question if you don't mind. I'd like to finetune the model on a custom dataset, so I need to reinitialize the last layer randomly and reshape it to match the number of classes in the new dataset. It seemed to me like this is already done in your code in these lines:
# Create network.
net = DeepLabResNetModel({'data': image_batch})
# Predictions.
raw_output = net.layers['fc1_voc12']
prediction = tf.reshape(raw_output, [-1, n_classes])
label_proc = prepare_label(label_batch, tf.pack(raw_output.get_shape()[1:3]))
gt = tf.reshape(label_proc, [-1, n_classes])
But if I simply change n_classes it throws an error:
ValueError: Dimension size must be evenly divisible by 11 but is 141204 for 'Reshape' (op: 'Reshape') with input shapes: [4,41,41,21], [2].
You will also need to change the model definition here: In particular, you would need to replace 21
with your number of classes in the calls to atrous convolution.
Note also that if you change the number of classes, but keep the names of the layers intact, restoring the original model parameters would not be possible since the shapes of the layers are different. Besides renaming, you can overcome this issue with loading only the weights for all the layers except the last ones: please refer to another issue for that: #11.
How would I have to call the saver
object in that case?
Right now its called like this
saver.restore(sess, 'ckpt_path/deeplab_resnet.ckpt')
It presumably fails since the saver object tries to restore the weights in ckpt_path/deeplab_resnet.ckpt
for the exact network structure that it was saved for. In the link you posted you show how to get a list of the layers that need to be reinitialized
not_restore = ['fc1_voc12_c0', 'fc1_voc12_c1', 'fc1_voc12_c2', 'fc1_voc12_c3']
restore_var = [v for v in tf.all_variables() if not in not_restore]
But I'm not sure how to initialize this list (restore_var
) together with the saver.restore
method which loads the weights that are stored in the checkpoint file.
When you initialise the instance of the Saver class, you can pass the var_list argument, which specifies the variables that will be saved and restored. Then you can call the restore method as usual (all the variables from the restore_var list must be presented in the checkpoint file, otherwise it will raise an error; the inverse is not needed to be satisfied here: your checkpoint file can hold other variables that you don't want to restore).
Thanks to your help I modified the initialization part thus:
trainable = tf.trainable_variables()
optim = optimiser.minimize(reduced_loss, var_list=trainable)
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
sess = tf.Session(config=config)
init = tf.initialize_all_variables() # All tensors in the graph are initialized with their initial values -> not clear what they are? All 0?
# Restore everything but the last layer
restore_var = [v for v in trainable if not'fc1_voc12')] # Only excluding ['fc1_voc12_c0', 'fc1_voc12_c1', 'fc1_voc12_c2', 'fc1_voc12_c3'] is apparantly not enough here
saver = tf.train.Saver(var_list=restore_var, max_to_keep=40)
saver.restore(sess, 'ckpt_path/deeplab_resnet.ckpt')
Now the network is compiling and starting to train, however the network is not converging (not even on Voc12 itself without having changed the number of classes). The loss is decreasing a bit in the beginning but remains on a high level. When looking at the output pictures one can see that the network first predicts noise and then only predicts the background class for all images (which is probably the dominant class in Voc12)
1) Maybe the last layer is not initialized with random noise?
2) Any idea how to go about the random initialization?
3) Maybe the tf.train.AdamOptimizer
is the wrong optimizer here? (I remember that in Deeplab-Caffe it was some Gradient-Descent with Momentum)
4) Did you ever successfully finetune deeplab-resnet on a different dataset or did you just use it for inference so far?
Thanks a lot for your help so far. Again sorry for annoying you with this problem but I'm close to giving up :-/
In the file
there is a function def make_var
which apparently is called to create all the variables in the network. I tried to add an initialization parameter there but still no luck so far unfortunately but looks like this is the place to look at...
def make_var(self, name, shape):
'''Creates a new TensorFlow variable.'''
# return tf.get_variable(name, shape, trainable=self.trainable) # This was the old version
return tf.get_variable(name, shape, trainable=self.trainable,initializer=tf.contrib.layers.xavier_initializer()) # This is the new one with explizit variable initialization
Ok, I found a way to make the model at least converge. I set the optimizer only to the last layer:
restore_var = [v for v in trainable if not'fc1_voc12_c')]
not_restore_var = [v for v in trainable if'fc1_voc12_c')]
optim = optimiser.minimize(reduced_loss, var_list=not_restore_var)
saver = tf.train.Saver(var_list=restore_var, max_to_keep=40)
if args.restore_from is not None:
load(saver, sess, args.restore_from)
I'll try to run it with two different optimizers, one with low learning rate for the earlier layers and one with higher learning rate for the last layer, hopefully that is solving my problem. I will report you about the outcome...
Hi @mgarbade.
The divergence is happening due to the fact that the batch normalisation layer used in kaffe-tensorflow is only tested for inference. It has been reported before: #5. For now, it is better to use another branch with correct batch normalisation: The script for fine-tuning is also provided there.
Let me know if the problem persists.
Well I'm happy to see that you are evolving your code :-). I'm still struggeling to get the same performance on my dataset (CamVid - 11 classes) as with the Caffe-Model of Deep-Lab-Resnet. At the moment I'm still 20% off (IoU).
Differences are :
Here are their learning rate parameter:
base_lr: 2.5e-4 # base learning rate
weight_decay: 0.0005 # weights are regularized by adding L2-norm * weight_decay to the loss function afaik
learning rate scale factor for convolutions (everything but last layer):
lr_mult: 1
decay_mult: 1
learning rate scale factor for the atrous convolutions (classifier layers):
lr_mult: 10
decay_mult: 1
You are using Adam as optimizer. Instead the DeepLab people use Gradient-Descent with momentum and a decreasing learning rate. These are their parameters:
iter_size: 10
lr_policy: "poly"
power: 0.9
momentum: 0.9
Might be that this is not better than Adam, but maybe it allows to enforce a slower weight change for the old network compared to weight change in the classifier layers.
I was happy to see that you implemented a ignore_label
class for the evaluation of IoU, but is that label also ignored during training / loss computation? -> I was looking for ways to implement this but haven't succeeded so far. I was thinking about taking the loss vector in setting all values corresponding to ignore_label to 0 in it
loss = tf.nn.softmax_cross_entropy_with_logits(prediction, gt) # Dim = [6724,1] --> 6724 = 41 x 41 x 4 = H x W x N loss_without_ignore_label = != ignore_label, loss, tf.zeros_like(loss)) #
--> This is not implemented yet / not sure if it would work.,TensorA,TensorB) checks for all entries in loss
whether they correspond to a ignore_label value in the ground truth (gt_label) and sets them to 0. So they don't contribute to the loss.
Sorry for the long post. I will keep on trying to push the fine-tuning performance and let you know if I can make it... Thanks
Thank you for your description.
I will look closely at ignore_label during training, and will try to provide a training script that better resembles the original procedure.
Thanks a lot for providing the ignore_label feature. I'm still looking forward to close the 10% performance gap compared to the caffe version of deeplab-resnet.
I identified some more differences:
scale_factors: 0.5
scale_factors: 0.75
scale_factors: 1
scale_factors: 1.25
scale_factors: 1.5
I will try to convert the caffemodel. Hopefully that will allow me to close the performance gap...
I just saw that the model pretrained on ms-coco was just the exact same that you provided as init model.
I further tried to monitor the development of the variables of the neural network by adding tf.summary.histogram loggers to all trainable variables in the network like this (based on your train.txt in the train-orig
for v in conv_trainable + fc_w_trainable + fc_b_trainable: # Add histogram to all variables
merged_summary_op = tf.summary.merge_all()
It looks like only the last layer is learning something, the earlier layers seem to not change at all:
is the first convolution layer. The other layers look the same.
and fc1_voc12_c0/weights_0
are the convolution weights from the last layers. Here, at least the bias is changing. Weights are again almost unchanged.
This pattern stays the same for more iterations... I will play around with the learning rate, but it seems like the optimization is not working correctly...
Might also be that the loss, that is computed in the Caffe version is much higher since they use an accumulative loss. Apparantly they add up the loss over 10 iterations (iter_size = 10) while using a batch_size of 3. Only after that they perform the backpropagation. So maybe their effective batch size is 30, which then produces a higher loss as compared to the batch_size of 10 which is used here. Could be that this is the reason why earlier layers have trouble learning...
@mgarbade, where batch_size=3 is coming from? In the train.prototxt provided by the authors, it is 1, isn't it?
In the original implementation, they also use 4 losses as I mentioned here, which should improve the gradient flow, as well.
Besides tracking the raw variable values, try also to track the ratio between gradient updates and parameter values (it is, in my opinion, a better indicator of whether the layer is learning something or not). Here is some pseudo code from the Karpathy's class on CNNs:
# assume parameter vector W and its gradient vector dW
param_scale = np.linalg.norm(W.ravel())
update = -learning_rate*dW # simple SGD update
update_scale = np.linalg.norm(update.ravel())
W += update # the actual update
print update_scale / param_scale # want ~1e-3
You are right. In the original code with the multiscale fusion, they have a batch_size of 1. In my version I had the multiscale part removed so I could have a batch_size of 3. Sorry for the confusion.
I'm not so sure about the indvidual losses for the different branches. Do they simply add them to the gradients during backpropagation or how do they combine them?
Good idea with the update_scale / param_scale! I'll check that. By the way: I updated my preprocessing to random cropping and 0-padding (images) / ignore_label-padding (labels). It gave me a huge boost (+10 % accuracy) on my other datasets (CamVid and Cityscapes). So although this might not be very important for Pascal Voc12, it apparantly is for other datasets.
Here is how I preprocess images at the moment:
def read_images_from_disk(input_queue, img_type, phase, input_size = (321,321), ignore_label = 255):
img_contents = tf.read_file(input_queue[0])
label_contents = tf.read_file(input_queue[1])
if img_type == 1:
img = tf.image.decode_jpeg(img_contents, channels=3) # VOC12
img = tf.image.decode_png(img_contents, channels=3) # CamVid
label = tf.image.decode_png(label_contents, channels=1)
# Change RGB to BGR
img_r, img_g, img_b = tf.split(split_dim=2, num_split=3, value=img)
img = tf.cast(tf.concat(2, [img_b, img_g, img_r]), dtype=tf.float32)
# Mean subtraction
IMG_MEAN = tf.constant([104.00698793,116.66876762,122.67891434],shape=[1,1,3], dtype=tf.float32) # BGR
IMG_MEAN = tf.reshape(IMG_MEAN,[1,1,3])
img = img - IMG_MEAN
# Optional preprocessing for training phase
if phase == 'train':
img, label = preprocess_input_train(img, label, ignore_label )
elif phase == 'valid':
# TODO: Perform only a central crop -> size should be the same as during training
elif phase == 'test':
return img, label
def preprocess_input_train(img, label, ignore_label ):
# Scale
scale = tf.random_uniform([1], minval=0.5, maxval=1.5, dtype=tf.float32, seed=None)
h_new = tf.to_int32(tf.mul(tf.to_float(tf.shape(img)[0]), scale))
w_new = tf.to_int32(tf.mul(tf.to_float(tf.shape(img)[1]), scale))
new_shape = tf.squeeze(tf.pack([h_new, w_new]), squeeze_dims=[1])
img = tf.image.resize_images(img, new_shape)
label = tf.image.resize_nearest_neighbor(tf.expand_dims(label, 0), new_shape)
label = tf.squeeze(label, squeeze_dims=[0])
# Mirror
random_number = tf.random_uniform([2], 0, 1.0, dtype=tf.float32)
img = image_mirroring(img, random_number)
label = image_mirroring(label, random_number)
# Crop and pad image
label = tf.cast(label, dtype=tf.float32) # Needs to be subtract and later added due to 0 padding
label = label - ignore_label
crop_h, crop_w = [321,321]
img_crop, label_crop = random_crop_and_pad_image_and_labels(img, label, crop_h, crop_w)
label_crop = label_crop + ignore_label
label_crop = tf.cast(label_crop, dtype=tf.uint8)
# Set static shape so that tensorflow knows shape at compile time
img_crop.set_shape((crop_h, crop_w, 3))
label_crop.set_shape((crop_h,crop_w, 1))
return img_crop, label_crop
def image_mirroring(image, random_number):
distort_left_right_random = random_number[0]
mirror = tf.less(tf.pack([1.0, distort_left_right_random, 1.0]), 0.5)
image = tf.reverse(image, mirror)
return image
and for cropping with padding
def random_crop_and_pad_image_and_labels(image, labels, crop_h, crop_w):
combined = tf.concat(2, [image, labels])
image_shape = tf.shape(image)
combined_pad = tf.image.pad_to_bounding_box(
combined, 0, 0,
tf.maximum(crop_h, image_shape[0]),
tf.maximum(crop_w, image_shape[1]))
last_image_dim = tf.shape(image)[-1]
last_label_dim = tf.shape(labels)[-1]
combined_crop = tf.random_crop(combined_pad,[crop_h,crop_w,4]) # TODO: Make cropping size a variable
return (combined_crop[:, :, :last_image_dim],
combined_crop[:, :, last_image_dim:])
Mind that the padding for the labels has to be done with "ignore_label". Since TF only performs a 0-padding I'm subtracting the ignore_label from label and add it again after the padding.
I'm not so sure about the indvidual losses for the different branches. Do they simply add them to the gradients during backpropagation or how do they combine them?
Yes, the Caffe mechanism takes care of that and adds all the gradients.
Nice work with pre-processing! Would be great if you could wrap it up as a PR :)
Thanks. I'm a bit busy at the moment so I just made a dirty PR
from the last state of my fork. When I have more time I will clean up the code and make a better PR. At the very least the dirty PR
contains the random image cropping and padding part in the file
(same functions as the ones I posted above)
@DrSleep : The batch-norm
probably have been merged with the master as I did not see any difference with the master
and batch-norm
I modified the
as following:
n_classes = 6
.atrous_conv(3, 3, n_classes, 6, padding='SAME', relu=False, name='fc1_voc12_c0'))
.atrous_conv(3, 3, n_classes, 12, padding='SAME', relu=False, name='fc1_voc12_c1'))
.atrous_conv(3, 3, n_classes, 18, padding='SAME', relu=False, name='fc1_voc12_c2'))
.atrous_conv(3, 3, n_classes, 24, padding='SAME', relu=False, name='fc1_voc12_c3'))
And starts the fine-tuning process with no convergence with n_classes = 21
but hits the following error when changing it to 6 which is my actual number of classes for my custom dataset:
ValueError: Dimension 0 in both shapes must be equal, but are 6724 and 23534 for 'SoftmaxCrossEntropyWithLogits' (op: 'SoftmaxCrossEntropyWithLogits') with input shapes: [6724,6], [23534,6].
Where is that multiplier of 1120.6667 (6724/6, 23534/21) really coming from? What other layers should I change?
You should change n_classes in the training script that you are using as well. Also modify accordingly.
Thanks for the help @DrSleep , I did not indeed notice that those had been defined there as well.
I hit the following now which probably means that the restore part is not functioning properly:
Caused by op u'save_1/Assign_419', defined at:
File "", line 201, in <module>
File "", line 179, in main
loader = tf.train.Saver(var_list=restore_var)
File "/home/petteri/anaconda2/lib/python2.7/site-packages/tensorflow/python/training/", line 1056, in __init__
InvalidArgumentError (see above for traceback): Assign requires shapes of both tensors to match. lhs shape= [6] rhs shape= [21]
[[Node: save_1/Assign_417 = Assign[T=DT_FLOAT, _class=["loc:@fc1_voc12_c0/biases"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/gpu:0"](fc1_voc12_c0/biases, save_1/RestoreV2_417/_13)]]
And got the fine-tuning to start with the tweaks by @mgarbade though
You are right that the restore part is not functioning properly: the last layers of your network differ from the original one (6 vs. 21 feature maps), thus when restoring you are receiving the error.
The solution is to restore all the layers but the last ones (fc1
): instead of restore_var = tf.global_variables()
, you should use restore_var = [v for v in tf.global_variables() if 'fc1' not in]
@petteriTeikari Have you solved the problem? I have the same problem!
InvalidArgumentError (see above for traceback): Assign requires shapes of both tensors to match. lhs shape= [2] rhs shape= [21] [[Node: save_1/Assign_417 = Assign[T=DT_FLOAT, _class=["loc:@fc1_voc12_c0/biases"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/cpu:0"](fc1_voc12_c0/biases, save_1/RestoreV2_417)]]
How can I modify images on the fly? Say I would like to set a certain area of the input images region to 0? Where in your code would I need to do the surgery for that?
Rather in the
function where the image is loaded?Or rather in the network graph itself, say by adding a layer after the data layer in
that multiplies elementwise with some mask?Sorry for bothering you with this stupid question. I'm new to tensorflow. Also sorry for asking usage-question, but since your code differs quite a lot from the tensorflow tutorial code I don't really know where else to turn for that question...Thanks a lot for providing the deeplab-resnet model for tensorflow!