filipradenovic / cnnimageretrieval-pytorch

CNN Image Retrieval in PyTorch: Training and evaluating CNNs for Image Retrieval in PyTorch
http://cmp.felk.cvut.cz/cnnimageretrieval
MIT License
1.42k stars 322 forks source link

GeM pooling parameter #47

Open andrefaraujo opened 5 years ago

andrefaraujo commented 5 years ago

Hi @filipradenovic ,

For your experiment on networks with whitening learned end-to-end, with triplet loss, trained on the Google Landmarks dataset 2018: could you share to which value the GeM pooling parameter p converged to?

If you could share learning curve showing the evolution of p over the training run, that would be even better :)

Thanks!

filipradenovic commented 5 years ago

The converged p values: gl18-tl-resnet50-gem-w: 2.8180 gl18-tl-resnet101-gem-w: 2.8640 gl18-tl-resnet152-gem-w: 2.9059

I don't have evolution of the p over the training run at hand right now, if I manage, I will try to update the response with the curve.

andrefaraujo commented 5 years ago

Thanks! No problem if you don't have the curve, I am definitely more interested in the final value.

I am training a ResNet50 with ArcFace loss, GeM pooling, whitening layer, but somehow the GeM power keeps converging to 1 (average pooling). I tried accelerating the LR (as done in your code), but it didn't really help. I guess it's hard to debug this, but if you have any thoughts on what might be wrong here please let me know :)

filipradenovic commented 5 years ago

I haven't tried training with ArcFace loss, but that should not be the problem. Maybe try the opposite, reduce the LR for p only, and observe how it changes, at some point during the training it may start going towards different values, not only towards 1. At that point, maybe you can try increasing learning rate again, or just keep it at the value where learning of p started "working".

mrgransky commented 3 years ago

I plot loss vs epoch for Triplet (margin=0,5), Contrastive (margin=0,5) and ArcFace (margin=0,5, scale=1.0) losses as follows.

python -m cirtorch.examples.train ./log \
    --gpu-id '0' \
    --print-freq 1000 \
    --epochs 200 \
    --training-dataset 'retrieval-SfM-120k' \
    --test-datasets    'roxford5k,rparis6k' \
    -a 'resnet101' \
    --pool 'gem' \
    --loss 'triplet' \ # 'contrastive' \  # 'arcface' \  
    --loss-margin 0.5 \
    --optimizer 'adam' \
    -lr 5e-7 \
    --whitening \
    --neg-num 5 \
    --query-size=25 \
    --pool-size=300 \
    --batch-size 5 \
    --image-size 362

I wonder if we can conclude/generalize that ArcFace loss outperforms both Contrastive and Triplet losses in CNNs incorporated with GeM pooling layers in global feature extraction?

This statements is pointed out in paper: Unifying Deep Local and Global Features for Image Search:

Global features. For global feature learning, we adopt a suitable loss functionwith L2-normalized classifier weights W, followed by scaled softmax normalization and cross-entropy loss [59]; this is sometimes referred to as “cosine classifier”. Additionally, we adopt the ArcFace margin [11], which has shown excellent results for global feature learning by inducing smaller intraclass variance.

Triplet m=0.5: triplet_200epochs

Contrastive m=0.5: contrastive_200epochs

ArcFace m=0.5 and scale=1.0 arcface_200epochs