Question about final layer of mobilenet

VisualComputingInstitute / triplet-reid

Code for reproducing the results of our "In Defense of the Triplet Loss for Person Re-Identification" paper.

MIT License

764 stars 216 forks source link

The reduce_mean results in a form of spatial invariance, which in market is probably rather important.

Apart from that, what you do is create a HUGE dimensionality for your embedding. For one you encode the spatial location of things in your embedding which probably is not smart and additionally, you make the training a lot easier because the network can more freely move things in this huge space, probably resulting in a less general model. Those are just my intuitions, but I'm sure about the spatial location at least.

You could do it with average_pooling2d if you assume your images always have the same size, and you use the size as a pooling window. But I don't see any advantages over reduce_mean.

I'm closing this for now. If you have further questions feel free to re-open.

VisualComputingInstitute / triplet-reid

Question about final layer of mobilenet #61