models question - Githubissues

Thank you for your excellent work . i have some questions:

why did you choose the Resnet-50 network trained on ImageNet [final fully connected layer] and VGG model trained on the Places365 dataset [512-dimensional output of the final pooling layer] to compare?
if the network output the feature map is WHD，how process the tensor ? wherther need to get aggregation feature or not ? or metric learning directly?

GWUvision / Hotels-50K