Vill-Lab / 2021-TIP-IGOAS

Incremental Generative Occlusion Adversarial Suppression Network for Person ReID (IEEE T-IP 2021)
MIT License
13 stars 4 forks source link

The results were too different #1

Closed frigidsun closed 3 years ago

frigidsun commented 3 years ago

Hi, I followed the "configs/bfe.yaml" to run your code, but I got very different result from your paper's reports. The mAP/rank-1 I get on Market-1501 is 77.8%/90.3%; on Occluded-Duke it is 38.2%/52.6%. I used CUDA10 and CUDA9 respectively, and there was no difference in the results. Here is my log, could you please help me check if any setting wrong in the hyperparameters?

python scripts/main.py --config-file configs/bfe.yaml --gpu-devices "8,9" test.eval_freq "10" train.batch_size "64" data.sources "['occludedduke']" data.targets "['occludedduke']" Currently using GPU 8,9

Show configuration adam: beta1: 0.9 beta2: 0.999 cuhk03: classic_split: False labeled_images: False use_metric_cuhk03: False data: combineall: False height: 384 norm_mean: [0.485, 0.456, 0.406] norm_std: [0.229, 0.224, 0.225] root: reid-data save_dir: log/bre sources: ['occludedduke'] split_id: 0 targets: ['occludedduke'] transforms: ['random_flip'] type: image width: 128 workers: 16 loss: name: softmax softmax: label_smooth: True triplet: margin: 0.3 weight_s: 1.0 weight_t: 1.0 weight_x: 1.0 market1501: use_500k_distractors: False model: load_weights: name: bfe pretrained: True resume: rmsprop: alpha: 0.99 sampler: num_instances: 4 train_sampler: RandomSampler sgd: dampening: 0.0 momentum: 0.9 nesterov: False test: batch_size: 128 dist_metric: euclidean eval_freq: 10 evaluate: False normalize_feature: False ranks: [1, 3, 5, 10] rerank: False start_eval: 0 visactmap: False visrank: False visrank_topk: 10 train: base_lr_mult: 0.1 batch_size: 64 fixbase_epoch: 5 gamma: 0.1 lr: 0.0003 lr_scheduler: multi_step max_epoch: 90 new_layers: ['classifier'] open_layers: ['res_part1', 'res_part2', 'classifier1', 'classifier2', 'reduction1', 'reduction2', 'batch_drop', 'batch_crop', 'batch_erase', 'att1', 'att_module2'] optim: adam print_freq: 200 seed: 1 staged_lr: False start_epoch: 0 stepsize: [20, 40] weight_decay: 0.0005 use_gpu: True video: pooling_method: avg sample_method: evenly seq_len: 15

Collecting env info ... System info PyTorch version: 1.4.0 Is debug build: No CUDA used to build PyTorch: 10.0

OS: Ubuntu 18.04.3 LTS GCC version: (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 CMake version: version 3.16.20191017-gf6dac38

Python version: 3.7 Is CUDA available: Yes CUDA runtime version: 10.0.130 GPU models and configuration: GPU 0: GeForce RTX 2080 Ti GPU 1: GeForce RTX 2080 Ti GPU 2: GeForce RTX 2080 Ti GPU 3: GeForce RTX 2080 Ti GPU 4: GeForce RTX 2080 Ti GPU 5: GeForce RTX 2080 Ti GPU 6: GeForce RTX 2080 Ti GPU 7: GeForce RTX 2080 Ti GPU 8: GeForce RTX 2080 Ti GPU 9: GeForce RTX 2080 Ti

Nvidia driver version: 430.50 cuDNN version: /usr/lib/x86_64-linux-gnu/libcudnn.so.7.6.3

Versions of relevant libraries: [pip3] numpy==1.21.0 [pip3] torch==1.4.0 [pip3] torchreid==1.0.6 [pip3] torchvision==0.5.0 [conda] blas 1.0 mkl
[conda] mkl 2021.2.0 h06a4308_296
[conda] mkl-service 2.3.0 py37h27cfd23_1
[conda] mkl_fft 1.3.0 py37h42c9631_2
[conda] mkl_random 1.2.1 py37ha9443f7_2
[conda] pytorch 1.4.0 py3.7_cuda10.0.130_cudnn7.6.3_0 pytorch [conda] torchreid 1.0.6 dev_0 [conda] torchvision 0.5.0 py37_cu100 pytorch Pillow (8.2.0)

Building train transforms ...

Building model: bfe Building softmax-engine for image-reid => Start training

shuguang-52 commented 3 years ago

I am very sorry, the most critical step in the code is written wrong. As shown in Fig. 2 of the paper, the final features are obtained by concatenating the features of the two branches. The previous version is v3_1= v1_1 * v2_1 The right version is v3_1=torch.cat([v1_1, v2_1], 1) I have updated the bfe.py file, please use the latest bfe.py to replace the previous version.

shuguang-52 commented 3 years ago

Since too many changes were made to the files torchreid/models/bfe.py and torchreid/engine/softmax.py, some of the changes may be problematic, but I forgot to restore them back. If there are other problems, I will fix them as soon as possible.

frigidsun commented 3 years ago

Thanks for your reply, I updated your source code, saw your changes in the "bfe.py" file, and ran the code again. But I got similar results as before, mAP/rank-1=38.0%/51.9% on Occluded-Duke, 77.6%/90.4% on Market-1501.

shuguang-52 commented 3 years ago

To verify the correctness of the code, I re-ran the experiment on Occluded-Duke with the following results: train.log-2021-06-24-19-16-29.txt As the log file shows, the performance reaches about the same level as in the paper around the 60th epoch. I uploaded configs/bfe.yaml, torchreid/models/bfe.py and torchreid/engine/softmax.py from the code I ran locally this time to GitHub, and this time there should be no problem.