gmberton / deep-visual-geo-localization-benchmark

Official code for CVPR 2022 (Oral) paper "Deep Visual Geo-localization Benchmark"
MIT License
179 stars 27 forks source link

torchscan: list index out of range #5

Closed sijieaaa closed 1 year ago

sijieaaa commented 1 year ago

Hi,

The torchscan seems not support CUDA 11.

image

gmberton commented 1 year ago

Hello, thanks for your interest! Can you share which parameters did you pass to the script to obtain this error? Can you share the debug.log file? It looks like the problem is due to the input_shape variable.

zafirshi commented 1 year ago

Hi~ :smile:

I meet the same Error too when using torchscan to estimate model size, especially when I set VIT or CCT as backbone.

I don't know why this happened and how to deal with it, so I just comment out related codes, and other code runs well.

However, if you can fix it up, that would be so great~

Following is some packages version in my running environment :

pytorch                   1.10.0          py3.7_cuda11.3_cudnn8.2.0_0    pytorch
torchscan                 0.1.1                                     0    frgfm
gmberton commented 1 year ago

Thank you @zafirshi, we'll try to fix it. Can you share your debug.log file that is generated in the run that gives this error?

zafirshi commented 1 year ago

Hi~ I know what happened. Actually, this is my problem, not the code problem. I'm really sorry to bother you for my mistakes. :persevere:

As you guessed, it is actually the problem of input image size. If run with ViT or CCT backbone, we need to hard_resize input img to 224 224 or 384 384 for using the pretrained weight. (CCT needs 384*384 only because we use _cct_14_7x2384. If changing code as demand, CCTs of other sizes can also be used)

I don't know whether the previous questioner reported an error for the same reason as me. @sijieaaa :wave:

After changing the input_size to 384 * 384, I found that the code runs normally under the CCT configuration. However, when using ViT as backbone and netvald as aggregation method , an error will be reported.

AttributeError: 'BaseModelOutputWithPooling' object has no attribute 'norm' 

I find that in this line below, variable outputs is not a tensor, it's a instance of class BaseModelOutputWithPooling cause we use third-part package transformers. (In cct backbone, output is a tensor)

https://github.com/gmberton/deep-visual-geo-localization-benchmark/blob/ace2a644422dd109d28a2f9fb77c292e3d1d9e97/model/aggregation.py#L162

I think if we output a tensor, this bug can be fixed. And I find that timm could realize this in a simple manner, and then ViT+netvald model will easily get! :+1:

gmberton commented 1 year ago

We just added the possibility to use ViT with NetVLAD, by making sure that also ViT outputs a tensor in all cases :)