Closed wpumain closed 1 year ago
out channels=hw=400,Is it wrong to set the value of out channels? Because:https://github.com/amaralibey/MixVPR/blob/31de0c35eef2c82589481293a200c9ff61741eb5/models/aggregators/mixvpr.py#L63
Hello @wpumain,
The backbone and aggregator components operate independently. Within the ResNet backbone, modifications to the out_channels
are made based on the specified layers_to_crop
parameter. When the last residual block is cropped, the channel dimension is halved (in ResNet architecture, there are always 2x channels in the following block and 2x less in the previous block).
The codebase has been structured such that neither the resnet.py nor mixvpr.py files need to be altered. Both modules can be easily configured from the main.py file.
When employing ResNet18 and cropping the final residual block as suggested in MixVPR, the resulting feature maps will have 256 channels. Additionally, if the input images is 320x320, the feature maps' spatial dimensions will be 20x20. The overall output of the cropped ResNet18 will be 256x20x20.
When creating a MixVPR instance, you specify the in_channels
as 256, in_h
as 20, and in_w
as 20. These parameters define the input feature map dimensions for MixVPR. The remaining parameters configure the MixVPR architecture, such as mix_depth, which sets the number of Mixer blocks to use, and out_channels, which determines the dimensionality reduction of the feature maps from in_channels to out_channels (in this case, from 256 to xxx). Additionally, out_rows specifies the projection of the flattened feature map's rows, which are in this case 20x20=400 in size. If you take a look at the figure of MixVPR architecture, things will get clearer.
The code in main.py will look like this:
model = VPRModel(
#---- Encoder
backbone_arch='resnet18',
pretrained=True,
layers_to_freeze=2,
layers_to_crop=[4], # 4 crops the last resnet layer, 3 crops the 3rd, ...etc
#---- Aggregator
agg_arch='MixVPR',
agg_config={
'in_channels' : 256, # nb of channels in the MixVPR input feature maps
'in_h' : 20, # height of the input feature maps
'in_w' : 20, # width of the input feature maps
'mix_depth' : 4,
'mlp_ratio' : 1,
'out_channels' : 128, # the channel wise reduction (could be any other value)
'out_rows' : 4}, # the output dim will be (out_rows * out_channels)
...
It's worth noting that if you wish to use smaller input images, say 224x224, you can adjust the in_h
and in_w
parameters in MixVPR accordingly. Remember that when cropping ResNet at the 4th layer, the spatial dimensions are always reduced by a factor of 16. As a result, the feature maps produced for a 224x224 image will have a spatial dimension of 14x14. This value should be specified in MixVPR by setting in_h=14
and in_w=14
.
Please let me know if there is anything that remains unclear.
Think you for your help very much. https://github.com/amaralibey/MixVPR/blob/31de0c35eef2c82589481293a200c9ff61741eb5/models/backbones/resnet.py#L87 The self.out_channels in ResNet must be equal to the in_channels in MixVPR, but the above code line doesn't seem to work. I think maybe we should pass self.out_channels in ResNet to in_channels in agg_config so that we don't have to manually modify in_channels in agg_config.
'in_channels' : 256, # nb of channels in the MixVPR input feature maps
What does nb stand for?
Think you for your help very much.
The self.out_channels in ResNet must be equal to the in_channels in MixVPR, but the above code line doesn't seem to work. I think maybe we should pass self.out_channels in ResNet to in_channels in agg_config so that we don't have to manually modify in_channels in agg_config.
It's the other way around, we are first taking a ResNet backbone, and feeding its output to MixVPR. It's always to MixVPR to adapt to the output of the Backbone, we don't need to modify the code in resnet.
So, it's in_channels
of MixVPR that must be equal to out_channels
of ResNet.
The code you're referring to ensure that if we crop ResNet at the 3rd layerm, then the number of channels must be divided by 4, if we crop at the 4th layer then it must be divided by 2, otherwise it's kept at 2048 for ResNet-50-101-152 and 512 for ResNet18-34.
The reason in_channels
must be manually fixed is a choice we made for this framework to train any aggregation technique (CosPlace, NetVLAD, GeM, ...etc). Each of these techniques use a different name for term for in_channels
and we didn't want to alter their code to rename the parameter. If you want to make this automatic, a better way to do it is to add a parameter named in_channels
into the helpers.get_aggregator() method, and make changes in the function to take it into account when calling the aggregator.
I will add these changes in the next sync.
'in_channels' : 256, # nb of channels in the MixVPR input feature maps
What does nb stand for?
nb means number, so: the number of channels in the MixVPR input feature maps
Thank you very much for your detailed guidance
When I use resnet18, the source code has automatically checked out in the ResNet class_ Modify the value of channels https://github.com/amaralibey/MixVPR/blob/31de0c35eef2c82589481293a200c9ff61741eb5/models/backbones/resnet.py#L86
However, there is no code in MixVPR Channels and out Channels are automatically modified. Should I modify them manually at this time? https://github.com/amaralibey/MixVPR/blob/31de0c35eef2c82589481293a200c9ff61741eb5/models/aggregators/mixvpr.py#L55
I should change the value of inchannels to 256, out channels=hw=400,out_ Rows remains unchanged at 4,right?