Closed SeunghanYu closed 7 months ago
Hello
In our experiments, using GeM + FC2048 (the first option that you mention) was one of the best configurations, it was better than keeping GeM only with 1024d. I have never tried the second option that you mention, but I would suggest to use the first one that doesn't even require to add any code
lastly, if you are looking for a model to output 2048-d descriptors, you can check out one of our latest works, EigenPlaces, it performs much better than those trained with a triplet https://github.com/gmberton/EigenPlaces
Thank you so much for your detailed response!
Hello, @gmberton!
I am currently working on a project involving deep visual geo-localization benchmark using a ResNet101conv4 model, which by default has an output dimension of 1024. For my specific application, I need to adjust the model's output dimension to 2048. I have identified two potential approaches to achieve this, and I would greatly appreciate your insights on which method might be more suitable or if there's another recommended strategy.
1. Utilizing a command-line argument to set the output dimension directly in the parser with --fc_output_dim=2048.
2. Modifying the network.py file to manually insert an additional Convolutional Layer into the existing CNN architecture, using the following code:
layers.append(nn.Conv2d(1024, 2048, kernel_size=(1,1), stride=(1,1), bias=False)).
Could you please provide guidance on the advantages or disadvantages of these approaches in the context of Deep Visual Geo-localization Benchmark? Which method would you recommend for effectively changing the output dimension while maintaining or enhancing model performance?
Looking forward to your reply. Thank you in advance!