Closed voqtuyen closed 6 years ago
The reduce_mean results in a form of spatial invariance, which in market is probably rather important.
Apart from that, what you do is create a HUGE dimensionality for your embedding. For one you encode the spatial location of things in your embedding which probably is not smart and additionally, you make the training a lot easier because the network can more freely move things in this huge space, probably resulting in a less general model. Those are just my intuitions, but I'm sure about the spatial location at least.
You could do it with average_pooling2d if you assume your images always have the same size, and you use the size as a pooling window. But I don't see any advantages over reduce_mean.
I'm closing this for now. If you have further questions feel free to re-open.
Thanks for providing us the source code of the paper. I have a question regarding the final layer of mobilenet. What is the purpose of the reduce_mean operation here? https://github.com/VisualComputingInstitute/triplet-reid/blob/2760af1589f558f0f061855e72646a5c1dffe3db/nets/mobilenet_v1_1_224.py#L16 When i replace it by
It seems that the model accuracy decreases a lot. Is it possible to use tf.layers.average_pooling2d instead and how?
Thanks