Closed howard-mahe closed 6 years ago
Great. I am totally agree with you about that. You have to wait the author reply because he may be busy in this time. To achieve the performance as the paper report. The code needs to change something including training batch normalization and fine-tune from pretrained model. Otherwise, you cannot achieve the target performance. I have used pretrained model and got abiht 74%( paper is 75.78%). I still check what is my problem that loss 1.8%
@howard-mahe: I found other bugs that are
channels = tf.split(axis=2, num_or_size_splits=num_channels, value=image)
for i in range(num_channels):
channels[i] -= means[i]
return tf.concat(axis=2, values=channels)
_R_MEAN = 123.68
_G_MEAN = 116.78
_B_MEAN = 103.94
We should you same pre-processing as the tf-slim did. Correct me if I was wrong?
voc.py
at l.31 and l.38 because:
a) tf.gfile.FastGFile.read()
(in convert_voc12.py
at l.62) loads RGB image
b) TF-Slim's models have also been trained with RGB images.
It's a common mistake among TF-Slim users since Caffe is based on OpenCV which loads BGR image.Ok, thank you for the details! But why using (an alternative) ImageNet's mean instead of VOC's mean?
Because the tfslim used imagenet mean in the preprocessing to achieve pretrained model.
This bring us back to a common question: during fine-tuning, should we use fine-tuning dataset image mean or pre-training data image mean? As I said, I believe (and FCN authors also do) the role of preprocessing is to zero-center input images, so I vote for using fine-tuning dataset image mean.
Are you sure (103.94, 116.78, 123.68) is not the VOC11 mean ? It looks like it is according to FCN repo which use nyud/pascalcontext/siftlow's mean when fine-tuning on these dataset, because they also use (103.94, 116.78, 123.68) as mean for voc fine-tuning (link)
FCN used VGG pretrained model that trained with your image mean. So yiu can see that value in the FCN case. However, for pascal voc it may difference. Btw, i have tested witg two difference image mean and the performance is not much difference. So we can use any kind of image mean. Only importance in image order. Let me know if you find any other bugs of this code. I still loss 1.7% comparison with paper report
@howard-mahe @John1231983 Thanks for your suggestions, I will take some time to look at it. I was working on something else the past two months.
@howard-mahe Hi, Howard. I think you are right about the multi-grid implementation. Thanks for digging into my crappy code. I will update the code ASAP. Your explanation of 1x1 project inspires me a lot.
@John1231983 Hi, John. Yes, tf.gfile.FastGFile.read()
read images as RGB
, while IMG_MEAN = np.array((104.00698793,116.66876762,122.67891434), dtype=np.float32)
is corresponded to BGR
. I have seen literature which did things like what I did. I am not sure which mode Slim used to train ImageNet, but after a long run training, the network will correct itself finally, since the mean-subtraction is only a preprocessing method, not a decisive thing. Check preprocessing for VGG inpu, I saw both preprocessing methods reported as doable.
I will rerun the program to update the results, the previous performance seems to be a combination of luck and overfitting.
Happy to see you again. You also can look at my question "Training with Batch normalization". We also discuss some bug in your code that explain why your performance is too high than paper
@NanqingD Thanks for your feedback! I also believe that IMG_MEAN doesn't matter that much at the end. @John1231983 spotted the most important bug about validation being performed on train set... Great code anyway and thank you for upcoming updates!
Yes I think that https://github.com/NanqingD/DeepLabV3-Tensorflow/issues/11 is the most important one.
@howard-mahe Hi, sorry to bother you again. Do you mind checking my ASPP implementation? I found others implement ASPP without relu function. I am not sure about this because the author didn't release the source code.
If somebody is interested: https://github.com/tensorflow/models/blob/master/research/deeplab/README.md
@bhack. We have wait in the long time. Thanks so so much your info.
Hi Nanqing,
First, thanks a lot for your implementation, this is a great piece of work!
I feel like you misunderstood the Multigrid block of DeepLabV3 network. You create a
bottleneck_hdc
unit that do:and then you repeat 3 times
bottleneck_hdc
unit. In ResNet bottleneck,conv1
is a decreasing projection,conv2
is a 3x3 convolution where dilation is supposed to happen,conv3
is an increasing projection. Please note that projections are 1x1 convolutions, for which dilation_rate doesn't have any effect.What is described in DeepLabV3 for block
n
| n={4,5,6,7}, is a succession of 3 standard bottleneck_v1 whose dilation rates aremulti_grid=(1,2,1)
for each of the 3 blocks, respectively. The corrected code should be: