Closed Dyfine closed 3 years ago
Hi there! Thanks for adding all the info when opening this issue, makes it much easier to debug :).
So while the setting you evaluate on is different to the one we trained/tested on, the changes shouldn't be as significant.
The thing that comes first to mind would be to check the influence of the single scheduling step, as convergence behaviour may be influenced a bit, i.e. adjusting the default --tau 55 --gamma 0.2
to something in e.g. --tau [40, ..., 80]
and trying --gamma [0.1, 0.3]
, or testing two scheduling steps s.a. --tau 60 90 --gamma 0.3
(although this shouldn't be necessary).
Finally, due to the more complex nature of training, there is some higher seed-based dependence of the final performance, so trying different seeds may be helpful to check if it is not just a seed-based deviation (i.e. setting --seed 0/1/2/3/4
).
Just to be sure I checked again, and with --tau 60 --gamma 0.3
I get results as shown in the image attached.
Also note that on CUB, you for sure don't have to train for 350 epochs, I easily get the best performance under 175/200 epochs :).
Thanks for your detailed reply! I will have a try and reply to you after my experiments.
Hi @Confusezius , I conduct some experiments but still can't get a 69.2 R@1. This time I also use the first environment setting (pytorch1.8.0.dev+faiss_gpu1.4.0+cuda11), and the best R@1 results I get are:
different seed
different scheduling
The R@1 plot of setting 1 is shown below.
The R@1 plot of setting 8 is also shown below. I find that the R@1 of epoch 60 is about 0.65, which is already worse than your result, about 0.67.
The parameter info of setting 8 is shown below.
dataset
cub200
train_val_split
1
lr
1e-05
fc_lr
-1
n_epochs
200
kernels
8
bs
112
seed
0
scheduler
step
gamma
0.3
decay
0.0004
tau
[60, 90]
use_sgd
False
loss
margin
batch_mining
distance
extension
none
embed_dim
128
arch
resnet50_frozen_normalize
not_pretrained
False
evaluation_metrics
['e_recall@1', 'e_recall@2', 'e_recall@4', 'nmi', 'f1', 'mAP_c']
evaltypes
['Combined_discriminative_selfsimilarity_shared_intra-0.75-1.25-1.25-1.25', 'Combined_discriminative_selfsimilarity_shared_intra-0.5-1-1-1', 'Combined_discriminative_selfsimilarity_shared_intra-0.5-1.5-1.5-1.5']
storage_metrics
['e_recall@1']
realistic_augmentation
False
realistic_main_augmentation
False
gpu
[0]
savename
source_path
./datasets/cub200
save_path
/data/dyfine/ECCV2020_DiVA_MultiFeature_DML-master/Training_Results/cub200/CUB200_RESNET50_FROZEN_NORMALIZE_2020-12-24-23-54-53
data_sampler
class_random
samples_per_class
2
data_batchmatch_bigbs
512
data_batchmatch_ncomps
10
data_storage_no_update
False
data_d2_coreset_lambda
1
data_gc_coreset_lim
1e-09
data_sampler_lowproj_dim
-1
data_sim_measure
euclidean
data_gc_softened
False
data_idx_full_prec
False
data_mb_mom
-1
data_mb_lr
1
miner_distance_lower_cutoff
0.5
miner_distance_upper_cutoff
1.4
loss_contrastive_pos_margin
0
loss_contrastive_neg_margin
1
loss_triplet_margin
0.2
loss_margin_margin
0.2
loss_margin_beta_lr
0.0005
loss_margin_beta
1.2
loss_margin_nu
0
loss_margin_beta_constant
False
loss_proxynca_lr
0.0005
loss_npair_l2
0.005
loss_angular_alpha
36
loss_angular_npair_ang_weight
2
loss_angular_npair_l2
0.005
loss_multisimilarity_pos_weight
2
loss_multisimilarity_neg_weight
40
loss_multisimilarity_margin
0.1
loss_multisimilarity_thresh
0.5
loss_lifted_neg_margin
1
loss_lifted_l2
0.005
loss_binomial_pos_weight
2
loss_binomial_neg_weight
40
loss_binomial_margin
0.1
loss_binomial_thresh
0.5
loss_quadruplet_alpha1
1
loss_quadruplet_alpha2
0.5
loss_softtriplet_n_centroids
10
loss_softtriplet_margin_delta
0.01
loss_softtriplet_gamma
0.1
loss_softtriplet_lambda
20
loss_softtriplet_reg_weight
0.2
loss_softtriplet_lr
0.0005
loss_softmax_lr
1e-05
loss_softmax_temperature
0.05
loss_histogram_nbins
51
loss_snr_margin
0.2
loss_snr_reg_lambda
0.005
loss_snr_beta
0
loss_snr_beta_lr
0.0005
loss_arcface_lr
0.0005
loss_arcface_angular_margin
0.5
loss_arcface_feature_scale
64
loss_quadruplet_margin_alpha_1
0.2
loss_quadruplet_margin_alpha_2
0.2
log_online
False
wandb_key
<your_api_key_here>
project
DiVA_SampleRuns
group
CUB_DiVA-R50-512
diva_ssl
fast_moco
diva_sharing
random
diva_intra
random
diva_features
['discriminative', 'selfsimilarity', 'shared', 'intra']
diva_decorrelations
['selfsimilarity-discriminative', 'shared-discriminative', 'intra-discriminative']
diva_rho_decorrelation
[1500.0, 1500.0, 1500.0]
diva_decorrnet_dim
512
diva_decorrnet_lr
1e-05
diva_instdiscr_temperature
0.1
diva_dc_update_f
2
diva_dc_ncluster
300
diva_moco_momentum
0.9
diva_moco_temperature
0.01
diva_moco_n_key_batches
30
diva_moco_lower_cutoff
0.5
diva_moco_upper_cutoff
1.4
diva_moco_temp_lr
0.0005
diva_moco_trainable_temp
False
diva_alpha_ssl
0.3
diva_alpha_shared
0.3
diva_alpha_intra
0.3
pretrained
True
device
cuda
network_feature_dim
2048
n_classes
100
I have checked my cub dataset, which is the same as the one in https://github.com/Confusezius/Revisiting_Deep_Metric_Learning_PyTorch. The only thing I change about the code is in adversarial_seperation.py, due to the requirement of pytorch1.8 (also pytorch1.5), as shown below.
class GradRev(torch.autograd.Function):
@staticmethod
def forward(ctx, x):
return x.view_as(x)
@staticmethod
def backward(ctx, grad_output):
return (grad_output * -1.)
def grad_reverse(x):
return GradRev.apply(x)
I still can't find out what's the problem. Do you have any idea about this?
That is really quite weird - there are some things you can try to generally improve performance:
[1] Train the 68.55 - run for 300 epochs to see if the performance still improves, just for completeness.
[2] Adjust the adversarial weightings --diva_rho_decorrelation
to e.g. [1000, 1000, 1000]
or [2000, 2000, 2000]or adjust the weight terms
--diva_alpha_ssl\shared\intrato e.g.
0.2or
0.4` to see if the change in convergence can be accounted for by slightly adjusting the levels of regularisation.
Once I have the time, I'll also check again with newer PyTorch versions to see if I can replicate the issue! The plot I published was done from a repo that has been adjusted, so I'll also check it with a version most closest to this specific one :).
Thanks for your reply and suggestions:) I'll keep trying and could you share with me the specific environment settings you use (e.g. the version of pytorch, faiss, cuda). I may try it if available.
Hi, may I ask that whether you reproduce the results of Cars196? I tried but only get about 84 for the recall@1 metric. BTW, I have set [diva_rho_decorrelation, alpha] = [100, 0.1] as the paper said. @Dyfine
Hi, thanks for your great job. DiVA is such interesting work! Could you please provide the information about hyperparameters on Cars196 and SOP datasets, as I came across some issues when trying to reproduce the results in your paper? @Confusezius
Hey there, so I was able to reproduce the results on two separate server instances, and the specific parameters used where
rho_decorrelation = 100, alpha = 0.15
for Cars196 (slight difference to the results reported in the paper due to some small pipeline changes, also make sure you check the results which reweigh the non-discriminative branches with 1.5 and the discriminative one with 0.5 which offers the best regularization) and
rho_recorrelation = 150, alpha = 0.2
Let me know if that helps! Since there are quite a lot of moving parts, it can be a bit fickle and setup dependent.
Hi, thanks for your reply! I will try harder on the Cars196 based on the details you supplied. I'll back here if I get some new results. Is the setting that rho_recorrelation = 150, alpha = 0.2 for SOP dataset? @Confusezius
Yes it is :)
Hi, may I ask that whether you reproduce the results of Cars196? I tried but only get about 84 for the recall@1 metric. BTW, I have set [diva_rho_decorrelation, alpha] = [100, 0.1] as the paper said. @Dyfine
Hi @XinyiXuXD , sorry that I haven't conduct experiments on Cars and SOP datasets. May I ask that have you reproduced the results on CUB? My experiments on CUB get a best R@1 68.84 which is a little worse than the reported result.
Hi @Dyfine, I didn't get the performance reported by the paper neither.
Hey so R@1 of 68.84 is reasonably close on CUB, however 84 is way to low on cars and shouldn't be happening. Are you using the Inceptionnet backbone? Because in that case, 84 should be the score range in which you land.
@XinyiXuXD Just to make sure, could you list the parameter setting with which you are running on CARS196 (and the other datasets)?
Hi @Confusezius, I set the weight for each branch to 1 at first, and get around 84 for Cars196. According to your reply, I reweight the discriminate branch with 0.5 and the non-discriminate ones with 1.5 and get around 86 for Cars196. The weights for the branches have a big effect based on my experimental results.
BTW, how did you get the weights for the branches?
The branch weights you can get from simple (cross-)validation, they transfer very well to the test case :). Indeed, the default setting should cover a range of default branch weightings that may help. It's really hard without validation experiments to determine what good branch weights are, it depends how well these auxiliary features can be estimates on these dataset. For example, for Cars196, all auxiliary feature types are well defined and can be estimated pretty well, which is why a higher weighting is beneficial.
Either way, treat the weighting as a simply hyperparameter determined via validation experiments :).
Let me know if there are other issues either by reopening or opening another issue :).
Hi, thanks for your great work and repo of DiVA! Currently I'm reproducing your paper results with this repo and I run ECCV2020_DiVA_SampleRuns.sh on CUB (resnet50). However, the best result I get is R@1=68.35, which is worse than those reported in the paper, where R@1 should be 69.2. I use pytorch1.8.0.dev+faiss_gpu1.4.0+cuda11 on one 3090 GPU and the detailed results are shown below. I wonder is there any problem with the environment settings or should I modify the settings in ECCV2020_DiVA_SampleRuns.sh? Besides, I also try another environment setting, pytorch1.5.1+faiss_gpu1.6.3+cuda10 on two 2080Ti GPU (I add torch.nn.DataParallel to model and selfsim_model, since it can't fit into one 2080Ti), and the best results are R@1=68.26 (0.75-1.25-1.25-1.25) NMI=71.20 (0.5-1-1-1). Thanks.