facebookresearch / deepcluster

Deep Clustering for Unsupervised Learning of Visual Features
Other
1.68k stars 324 forks source link

Can't reproduce retrieval numbers on Oxford. #58

Closed mbsariyildiz closed 4 years ago

mbsariyildiz commented 4 years ago

Dear @mathildecaron31

I am trying to reproduce the retrieval scores and having the following small issue. I downloaded the datasets and compiled the evaluation code according to your instructions in eval_retrieval.sh. When I run the code for "ImageNet pre-trained" and "DeeperCluster" models, I get the following results:

Method Paris Oxford
ImageNet Labels (yaou report) 81.5 72.4
ImageNet Labels (I compute) 81.3 64.9
DeeperCluster (you report) 73.4 55.8
DeeperCluster (I compute) 73.3 54.0

The default setting of eval_retrieval.sh evaluates models on Paris. So, to evaluate the models on Oxford, I set

EVAL="Oxford"
PCA="Paris"

Do you see any mistake here? Why do you think I get significantly and slightly lower results for "ImageNet labels" and "DeeperCluster" on Oxford, respectively?

Many thanks.

mathildecaron31 commented 4 years ago

Hi, The results for ImageNet labels come from a training with Sobel filtering. Read Section 5.3 from deepcluster paper.

Table 5 reports the performance of a VGG-16 trained with different approaches obtained with Sobel filtering, except for Doersch et al. [25] and Wang et al. [46]. This preprocessing improves by 5.5 points the mAP of a supervised VGG-16 on the Oxford dataset, but not on Paris.

I assume you haven't trained your ImageNet labels baseline with Sobel, which would explain why you are not reproducing these numbers (especially the large performance gap for Oxford).

mathildecaron31 commented 4 years ago

Hi, I've been investigating the difference of performance for DeeperCluster models. When loading DeeperCluster models for eval_retrieval.py code, you should adjust the padding used for the sobel layers carefully.

Indeed, vgg-16 models with sobel filtering for this repo have a padding of 1 (see sobel.1):

(sobel): Sequential(
    (0): Conv2d(3, 1, kernel_size=(1, 1), stride=(1, 1))
    (1): Conv2d(1, 2, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  )

However, vgg-16 models with sobel filtering for DeeperCluster repo have a padding of 2 (see padding layer):

(padding): ConstantPad2d(padding=(2, 2, 2, 2), value=0.0)
  (sobel): Sequential(
    (0): Conv2d(3, 1, kernel_size=(1, 1), stride=(1, 1))
    (1): Conv2d(1, 2, kernel_size=(3, 3), stride=(1, 1))
  )

Hence, when using DeeperCluster vgg-16 models with sobel filtering with this repo, you should increase the padding value to 2 instead of 1. For example, you can do so by adding this line: vc = torch.nn.ConstantPad2d(1, 0)(vc) just before this line and this line. This way, you should be able to reproduce the numbers from DeeperCluster paper.

Hope that helps

mbsariyildiz commented 4 years ago

Hello,

Thank you for addressing the issue.

I saw the caption of the corresponding table but found the code misleading. Because when

MODEL='pretrained'

here, the code loads a pre-trained models from torchvision repo here. Instead I expected you to share the pre-trained ImageNet model.

I haven't re-run the evaluation code with the modifications you suggested. But I think it is quite interesting that this level of fine details in sobel filtering (padding for instance) affects the results significantly.

Thanks