QVPR / Patch-NetVLAD

Code for the CVPR2021 paper "Patch-NetVLAD: Multi-Scale Fusion of Locally-Global Descriptors for Place Recognition"
MIT License
510 stars 72 forks source link

For academic communication #42

Closed DonMuv closed 2 years ago

DonMuv commented 2 years ago

Hi, I'm doing related research recently, how should I modify these codes If the backbone network is not vgg16 https://github.com/QVPR/Patch-NetVLAD/blob/main/patchnetvlad/models/local_matcher.py#L91 H = int(int(config['imageresizeH']) / 16) # 16 is the vgg scaling from image space to feature space (conv5) W = int(int(config['imageresizeW']) / 16) Suppose my network has 21 conv layers Thanks

StephenHausler commented 2 years ago

The hardcoded value 16 is the effective stride of the last conv layer of the neural network. To work out the effective stride for different networks and layers, please see the below links: https://distill.pub/2019/computing-receptive-fields/ And then: https://github.com/google-research/receptive_field/blob/master/receptive_field/RECEPTIVE_FIELD_TABLE.md

All you should have to do is find your network within that table and then replace the 16 with whatever value of effective stride is given in the table. Alternatively, you can eyeball the results since the value 16 is just the width of the image divided by the spatial width W of the conv layer in question (by passing a test image through your network).

You will also need to adjust the values in calc_receptive_boxes line 65 of local_matcher.py. Again these values can be found using the linked webpages.

divyagupta25 commented 1 year ago

Hi @StephenHausler I have just a 2-layer Conv-net, and I obtained its effective receptive field, padding, and stride from the link that you mentioned above. I am getting None in the effective padding. I used it as zero in this line https://github.com/QVPR/Patch-NetVLAD/blob/cba383478e6b656c76a8c5034a3681f69ab59ddc/patchnetvlad/models/local_matcher.py#L65

However, I am getting a keypoint as (388, 388) even though my image size is (384, 384). https://github.com/QVPR/Patch-NetVLAD/blob/cba383478e6b656c76a8c5034a3681f69ab59ddc/patchnetvlad/models/local_matcher.py#L116

Could you please suggest what could be the problem? Below is the tensorflow code that I am using to get the effective receptive field, stride and padding

import receptive_field as rf
import tensorflow.compat.v1 as tf
from tensorflow.keras import datasets, layers, models

# Construct graph.
g = tf.Graph()
with g.as_default():
    x = tf.placeholder(tf.float32, shape=(1, 384, 384, 3), name='input_image')
    model = models.Sequential()
    model.add(layers.Conv2D(filters=64, kernel_size=(7, 7), strides=(2, 2), activation='relu', use_bias=False, padding='same', input_shape=(384, 384, 3)))      
    model.add(layers.MaxPooling2D(pool_size=(2, 2), strides=(2, 2), padding='same'))    
    model.add(layers.Conv2D(filters=384, kernel_size=(7, 7), strides=(2, 2), activation='relu', use_bias=False, padding='same'))                             
    model.add(layers.MaxPooling2D(pool_size=(2, 2), strides=(2, 2), padding='same', name='op'))   
    output = model(x)

res = [n.name for n in g.as_graph_def().node]     # print(res)

# Compute receptive field parameters.
rf_x, rf_y, eff_stride_x, eff_stride_y, eff_pad_x, eff_pad_y = rf.compute_receptive_field_from_graph_def(g.as_graph_def(), 'conv2d_input', 'op/MaxPool')

print("\nReceptive field: ")
print(rf_x, rf_y)                           # 41, 41

print("\nStride: ")
print(eff_stride_x, eff_stride_y)           # 16, 16

print("\nPadding: ")
print(eff_pad_x, eff_pad_y)                 # None, None 
StephenHausler commented 1 year ago

Hi @divyagupta25,

Sorry for the late reply. Because you are using a 2-layer conv-net, you will need to make a number of changes to the code. Our code hardcodes the backbone as vgg-16 conv5_3 in a number of places.

First, you'll need to change line 65 of local_matcher.py to: rf, stride, padding = [41.0, 16.0, 0.0]

Then, you'll need to change lines 91 and 92 of local_matcher.py and change the number 16 to the new scaling from image space to feature space. You can calculate this ratio by dividing the initial image input size by the H,W dimensions of the output tensor after your two-layer conv-net.

Hope this helps.