unable to reproduce results of Market1501 based on SBS(R50-ibn) results

sijun-zhou commented 3 years ago

my environment: python3.6 pytorch 1.2.0 cuda 10.0.130 apex 0.1 GPU 2*2080TI

I train it with 2 2080ti gpu card on market1501 dataset with all default settings of sbs_R50-ibn.yml but i cannot reproduce the results My highest results is as follows for highest top1(92.64%) and map(78.78%) respectively, which is far less then the model zone 95.7%(top1) and 89.3%(map):

##################################### top1 ################################################ [03/22 10:37:55 fastreid.utils.events]: eta: 0:00:27 epoch/iter: 59/11999 total_loss: 12.79 loss_cls: 12.79 loss_triplet: 9.157e-05 time: 0.2132 data_time: 0.0007 lr: 7.00e-07 max_mem: 9573M [03/22 10:38:22 fastreid.utils.events]: eta: 0:00:00 epoch/iter: 59/12119 total_loss: 12.79 loss_cls: 12.79 loss_triplet: 7.518e-05 time: 0.2134 data_time: 0.0009 lr: 7.00e-07 max_mem: 9573M [03/22 10:38:23 fastreid.engine.defaults]: Prepare testing set [03/22 10:38:23 fastreid.data.datasets.bases]: => Loaded Market1501 in csv format: subset # ids # images # cameras	:---------	:--------	:-----------	:------------		query	750	3368	6		gallery	751	15913	6	[03/22 10:38:23 fastreid.evaluation.evaluator]: Start inference on 19281 images [03/22 10:38:30 fastreid.evaluation.evaluator]: Inference done 11/151. 0.1033 s / batch. ETA=0:00:14 [03/22 10:38:45 fastreid.evaluation.evaluator]: Total inference time: 0:00:15.542858 (0.106458 s / batch per device, on 2 devices) [03/22 10:38:45 fastreid.evaluation.evaluator]: Total inference pure compute time: 0:00:15 (0.103480 s / batch per device, on 2 devices) [03/22 10:40:17 fastreid.engine.defaults]: Evaluation results for Market1501 in csv format: [03/22 10:40:17 fastreid.evaluation.testing]: Evaluation results in csv format: Dataset Rank-1 Rank-5 Rank-10 mAP mINP metric
Market1501	92.64	97.06	98.28	78.01	42.65	85.32

###########################################################################################

##################################### map ################################################ [03/22 10:27:15 fastreid.utils.events]: eta: 0:08:44 epoch/iter: 48/9799 total_loss: 14.32 loss_cls: 14.32 loss_triplet: 0.001028 time: 0.2104 data_time: 0.0009 lr: 1.04e-04 max_mem: 9573M [03/22 10:27:38 fastreid.utils.events]: eta: 0:08:22 epoch/iter: 48/9897 total_loss: 14.46 loss_cls: 14.45 loss_triplet: 0.0009101 time: 0.2105 data_time: 0.0006 lr: 1.04e-04 max_mem: 9573M [03/22 10:28:01 fastreid.utils.events]: eta: 0:07:59 epoch/iter: 49/9999 total_loss: 14.38 loss_cls: 14.38 loss_triplet: 0.001046 time: 0.2107 data_time: 0.0009 lr: 8.80e-05 max_mem: 9573M [03/22 10:28:24 fastreid.engine.defaults]: Prepare testing set [03/22 10:28:24 fastreid.data.datasets.bases]: => Loaded Market1501 in csv format: subset # ids # images # cameras	:---------	:--------	:-----------	:------------		query	750	3368	6		gallery	751	15913	6	[03/22 10:28:24 fastreid.evaluation.evaluator]: Start inference on 19281 images [03/22 10:28:32 fastreid.evaluation.evaluator]: Inference done 11/151. 0.1015 s / batch. ETA=0:00:14 [03/22 10:28:47 fastreid.evaluation.evaluator]: Total inference time: 0:00:15.644438 (0.107154 s / batch per device, on 2 devices) [03/22 10:28:47 fastreid.evaluation.evaluator]: Total inference pure compute time: 0:00:15 (0.104161 s / batch per device, on 2 devices) [03/22 10:30:45 fastreid.engine.defaults]: Evaluation results for Market1501 in csv format: [03/22 10:30:45 fastreid.evaluation.testing]: Evaluation results in csv format: Dataset Rank-1 Rank-5 Rank-10 mAP mINP metric
Market1501	92.49	97.39	98.25	78.78	44.38	85.64

###########################################################################################

sijun-zhou commented 3 years ago

Below is my training configs: (fastreid) root@sj_docker1_117:/home/wesine/data_8tb_3/sj/work/reid/fast-reid $ cd /home/wesine/data_8tb_3/sj/work/reid/fast-reid ; env PYTHONIOENCODING=UTF-8 PYTHONUNBUFFERED=1 /root/anaconda3/envs/fastreid/bin/python /root/.vscode-server/extensions/ms-python.python-2020.2.64397/pythonFiles/ptvsd_launcher.py --default --nodebug --client --host localhost --port 43535 /home/wesine/data_8tb_3/sj/work/reid/fast-reid/tools/train_net.py --config-file ./configs/Market1501/sbs_R50-ibn.yml --num-gpus 2 Command Line Args: Namespace(config_file='./configs/Market1501/sbs_R50-ibn.yml', dist_url='tcp://127.0.0.1:49152', eval_only=False, machine_rank=0, num_gpus=2, num_machines=1, opts=[], resume=False) [03/22 09:47:02 fastreid]: Rank of current process: 0. World size: 2 [03/22 09:47:03 fastreid]: Environment info:

sys.platform linux Python 3.6.13	packaged by conda-forge	(default, Feb 19 2021, 05:36:01) [GCC 9.3.0] numpy 1.19.5 fastreid 1.0.0 @/home/wesine/data_8tb_3/sj/work/reid/fast-reid/fastreid FASTREID_ENV_MODULE PyTorch 1.2.0 @/root/anaconda3/envs/fastreid/lib/python3.6/site-packages/torch PyTorch debug build False GPU available True GPU 0,1 GeForce RTX 2080 Ti CUDA_HOME /usr/local/cuda Pillow 8.1.2 torchvision 0.4.0 @/root/anaconda3/envs/fastreid/lib/python3.6/site-packages/torchvision torchvision arch flags sm_35, sm_50, sm_60, sm_70, sm_75 cv2 4.5.1

PyTorch built with:

GCC 7.3
Intel(R) Math Kernel Library Version 2019.0.4 Product Build 20190411 for Intel(R) 64 architecture applications
Intel(R) MKL-DNN v0.18.1 (Git Hash 7de7e5d02bf687f971e7668963649728356e0c20)
OpenMP 201511 (a.k.a. OpenMP 4.5)
NNPACK is enabled
CUDA Runtime 10.0
NVCC architecture flags: -gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_50,code=compute_50
CuDNN 7.6.2
Magma 2.5.1
Build settings: BLAS=MKL, BUILD_NAMEDTENSOR=OFF, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -fopenmp -DUSE_FBGEMM -DUSE_QNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Wno-stringop-overflow, DISABLE_NUMA=1, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=True, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,

[03/22 09:47:03 fastreid]: Command line arguments: Namespace(config_file='./configs/Market1501/sbs_R50-ibn.yml', dist_url='tcp://127.0.0.1:49152', eval_only=False, machine_rank=0, num_gpus=2, num_machines=1, opts=[], resume=False) [03/22 09:47:03 fastreid]: Contents of args.config_file=./configs/Market1501/sbs_R50-ibn.yml: BASE: ../Base-SBS.yml

MODEL: BACKBONE: WITH_IBN: True

DATASETS: NAMES: ("Market1501",) TESTS: ("Market1501",)

OUTPUT_DIR: logs/market1501/sbs_R50-ibn

[03/22 09:47:03 fastreid]: Running with full config: CUDNN_BENCHMARK: True DATALOADER: NAIVE_WAY: True NUM_INSTANCE: 16 NUM_WORKERS: 8 PK_SAMPLER: True DATASETS: COMBINEALL: False NAMES: ('Market1501',) TESTS: ('Market1501',) INPUT: AUGMIX_PROB: 0.0 AUTOAUG_PROB: 0.1 CJ: BRIGHTNESS: 0.15 CONTRAST: 0.15 ENABLED: False HUE: 0.1 PROB: 0.5 SATURATION: 0.1 DO_AFFINE: False DO_AUGMIX: False DO_AUTOAUG: True DO_FLIP: True DO_PAD: True FLIP_PROB: 0.5 PADDING: 10 PADDING_MODE: constant REA: ENABLED: True PROB: 0.5 VALUE: [123.675, 116.28, 103.53] RPT: ENABLED: False PROB: 0.5 SIZE_TEST: [384, 128] SIZE_TRAIN: [384, 128] KD: MODEL_CONFIG: [''] MODEL_WEIGHTS: [''] MODEL: BACKBONE: DEPTH: 50x FEAT_DIM: 2048 LAST_STRIDE: 1 NAME: build_resnet_backbone NORM: BN PRETRAIN: True PRETRAIN_PATH: WITH_IBN: True WITH_NL: True WITH_SE: False DEVICE: cuda FREEZE_LAYERS: ['backbone'] HEADS: CLS_LAYER: circleSoftmax EMBEDDING_DIM: 0 MARGIN: 0.35 NAME: EmbeddingHead NECK_FEAT: after NORM: BN NUM_CLASSES: 0 POOL_LAYER: gempoolP SCALE: 64 WITH_BNNECK: True LOSSES: CE: ALPHA: 0.2 EPSILON: 0.1 SCALE: 1.0 CIRCLE: GAMMA: 128 MARGIN: 0.25 SCALE: 1.0 COSFACE: GAMMA: 128 MARGIN: 0.25 SCALE: 1.0 FL: ALPHA: 0.25 GAMMA: 2 SCALE: 1.0 NAME: ('CrossEntropyLoss', 'TripletLoss') TRI: HARD_MINING: True MARGIN: 0.0 NORM_FEAT: False SCALE: 1.0 META_ARCHITECTURE: Baseline PIXEL_MEAN: [123.675, 116.28, 103.53] PIXEL_STD: [58.395, 57.120000000000005, 57.375] QUEUE_SIZE: 8192 WEIGHTS: OUTPUT_DIR: logs/market1501/sbs_R50-ibn SOLVER: BASE_LR: 0.00035 BIAS_LR_FACTOR: 1.0 CHECKPOINT_PERIOD: 20 DELAY_EPOCHS: 30 ETA_MIN_LR: 7e-07 FP16_ENABLED: False FREEZE_FC_ITERS: 0 FREEZE_ITERS: 1000 GAMMA: 0.1 HEADS_LR_FACTOR: 1.0 IMS_PER_BATCH: 64 MAX_EPOCH: 60 MOMENTUM: 0.9 NESTEROV: True OPT: Adam SCHED: CosineAnnealingLR STEPS: [40, 90] WARMUP_FACTOR: 0.1 WARMUP_ITERS: 2000 WARMUP_METHOD: linear WEIGHT_DECAY: 0.0005 WEIGHT_DECAY_BIAS: 0.0005 TEST: AQE: ALPHA: 3.0 ENABLED: False QE_K: 5 QE_TIME: 1 EVAL_PERIOD: 10 FLIP_ENABLED: False IMS_PER_BATCH: 128 METRIC: cosine PRECISE_BN: DATASET: Market1501 ENABLED: False NUM_ITER: 300 RERANK: ENABLED: False K1: 20 K2: 6 LAMBDA: 0.3 ROC_ENABLED: False [03/22 09:47:03 fastreid]: Full config saved to /home/wesine/data_8tb_3/sj/work/reid/fast-reid/logs/market1501/sbs_R50-ibn/config.yaml [03/22 09:47:03 fastreid.utils.env]: Using a generated random seed 3342157 [03/22 09:47:03 fastreid.engine.defaults]: Prepare training set [03/22 09:47:03 fastreid.data.datasets.bases]: => Loaded Market1501 in csv format: subset # ids # images # cameras
|:---------|:--------|:-----------|:------------| | train | 751 | 12936 | 6 | [03/22 09:47:03 fastreid.engine.defaults]: Auto-scaling the num_classes=751 [03/22 09:47:04 fastreid.modeling.backbones.resnet]: Loading pretrained model from /root/.cache/torch/checkpoints/resnet50_ibn_a-d9d0bb7b.pth [03/22 09:47:04 fastreid.modeling.backbones.resnet]: Some model parameters or buffers are not found in the checkpoint: NL_2.0.g.{weight, bias} NL_2.0.W.0.{weight, bias} NL_2.0.W.1.{weight, bias, running_mean, running_var} NL_2.0.theta.{weight, bias} NL_2.0.phi.{weight, bias} NL_2.1.g.{weight, bias} NL_2.1.W.0.{weight, bias} NL_2.1.W.1.{weight, bias, running_mean, running_var} NL_2.1.theta.{weight, bias} NL_2.1.phi.{weight, bias} NL_3.0.g.{weight, bias} NL_3.0.W.0.{weight, bias} NL_3.0.W.1.{weight, bias, running_mean, running_var} NL_3.0.theta.{weight, bias} NL_3.0.phi.{weight, bias} NL_3.1.g.{weight, bias} NL_3.1.W.0.{weight, bias} NL_3.1.W.1.{weight, bias, running_mean, running_var} NL_3.1.theta.{weight, bias} NL_3.1.phi.{weight, bias} NL_3.2.g.{weight, bias} NL_3.2.W.0.{weight, bias} NL_3.2.W.1.{weight, bias, running_mean, running_var} NL_3.2.theta.{weight, bias} NL_3.2.phi.{weight, bias} [03/22 09:47:04 fastreid.modeling.backbones.resnet]: The checkpoint state_dict contains keys that are not used by the model: fc.{weight, bias}

Baseline( (backbone): ResNet( (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False) (bn1): BatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=True) (layer1): Sequential( (0): Bottleneck( (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): IBN( (IN): InstanceNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (BN): BatchNorm(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (se): Identity() (downsample): Sequential( (0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (1): BatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (1): Bottleneck( (conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): IBN( (IN): InstanceNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (BN): BatchNorm(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (se): Identity() ) (2): Bottleneck( (conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): IBN( (IN): InstanceNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (BN): BatchNorm(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (se): Identity() ) ) (layer2): Sequential( (0): Bottleneck( (conv1): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): IBN( (IN): InstanceNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (BN): BatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False) (bn2): BatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (se): Identity() (downsample): Sequential( (0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False) (1): BatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (1): Bottleneck( (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): IBN( (IN): InstanceNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (BN): BatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (se): Identity() ) (2): Bottleneck( (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): IBN( (IN): InstanceNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (BN): BatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (se): Identity() ) (3): Bottleneck( (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): IBN( (IN): InstanceNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (BN): BatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (se): Identity() ) ) (layer3): Sequential( (0): Bottleneck( (conv1): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): IBN( (IN): InstanceNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (BN): BatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False) (bn2): BatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (se): Identity() (downsample): Sequential( (0): Conv2d(512, 1024, kernel_size=(1, 1), stride=(2, 2), bias=False) (1): BatchNorm(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (1): Bottleneck( (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): IBN( (IN): InstanceNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (BN): BatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (se): Identity() ) (2): Bottleneck( (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): IBN( (IN): InstanceNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (BN): BatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (se): Identity() ) (3): Bottleneck( (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): IBN( (IN): InstanceNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (BN): BatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (se): Identity() ) (4): Bottleneck( (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): IBN( (IN): InstanceNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (BN): BatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (se): Identity() ) (5): Bottleneck( (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): IBN( (IN): InstanceNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (BN): BatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (se): Identity() ) ) (layer4): Sequential( (0): Bottleneck( (conv1): Conv2d(1024, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (se): Identity() (downsample): Sequential( (0): Conv2d(1024, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False) (1): BatchNorm(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (1): Bottleneck( (conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (se): Identity() ) (2): Bottleneck( (conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (se): Identity() ) ) (NL_1): ModuleList() (NL_2): ModuleList( (0): Non_local( (g): Conv2d(512, 1, kernel_size=(1, 1), stride=(1, 1)) (W): Sequential( (0): Conv2d(1, 512, kernel_size=(1, 1), stride=(1, 1)) (1): BatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (theta): Conv2d(512, 1, kernel_size=(1, 1), stride=(1, 1)) (phi): Conv2d(512, 1, kernel_size=(1, 1), stride=(1, 1)) ) (1): Non_local( (g): Conv2d(512, 1, kernel_size=(1, 1), stride=(1, 1)) (W): Sequential( (0): Conv2d(1, 512, kernel_size=(1, 1), stride=(1, 1)) (1): BatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (theta): Conv2d(512, 1, kernel_size=(1, 1), stride=(1, 1)) (phi): Conv2d(512, 1, kernel_size=(1, 1), stride=(1, 1)) ) ) (NL_3): ModuleList( (0): Non_local( (g): Conv2d(1024, 1, kernel_size=(1, 1), stride=(1, 1)) (W): Sequential( (0): Conv2d(1, 1024, kernel_size=(1, 1), stride=(1, 1)) (1): BatchNorm(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (theta): Conv2d(1024, 1, kernel_size=(1, 1), stride=(1, 1)) (phi): Conv2d(1024, 1, kernel_size=(1, 1), stride=(1, 1)) ) (1): Non_local( (g): Conv2d(1024, 1, kernel_size=(1, 1), stride=(1, 1)) (W): Sequential( (0): Conv2d(1, 1024, kernel_size=(1, 1), stride=(1, 1)) (1): BatchNorm(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (theta): Conv2d(1024, 1, kernel_size=(1, 1), stride=(1, 1)) (phi): Conv2d(1024, 1, kernel_size=(1, 1), stride=(1, 1)) ) (2): Non_local( (g): Conv2d(1024, 1, kernel_size=(1, 1), stride=(1, 1)) (W): Sequential( (0): Conv2d(1, 1024, kernel_size=(1, 1), stride=(1, 1)) (1): BatchNorm(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (theta): Conv2d(1024, 1, kernel_size=(1, 1), stride=(1, 1)) (phi): Conv2d(1024, 1, kernel_size=(1, 1), stride=(1, 1)) ) ) (NL_4): ModuleList() ) (heads): EmbeddingHead( (pool_layer): GeneralizedMeanPoolingP(Parameter containing: tensor([3.], device='cuda:0', requires_grad=True), output_size=1) (bottleneck): Sequential( (0): BatchNorm(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (classifier): CircleSoftmax(in_features=2048, num_classes=751, scale=64, margin=0.35) ) ) [03/22 09:47:14 fastreid.utils.checkpoint]: No checkpoint found. Training model from scratch [03/22 09:47:14 fastreid.engine.train_loop]: Starting training from epoch 0 [03/22 09:47:14 fastreid.engine.hooks]: Freeze layer group "backbone" training for 1000 iterations [03/22 09:47:30 fastreid.utils.events]: eta: 0:12:59 epoch/iter: 0/199 total_loss: 64.82 loss_cls: 50.34 loss_triplet: 14.5 time: 0.0668 data_time: 0.0010 lr: 6.63e-05 max_mem: 9426M [03/22 09:47:30 fastreid.utils.events]: eta: 0:12:57 epoch/iter: 0/201 total_loss: 64.8 loss_cls: 50.34 loss_triplet: 14.47 time: 0.0667 data_time: 0.0011 lr: 6.67e-05 max_mem: 9426M [03/22 09:47:44 fastreid.utils.events]: eta: 0:13:04 epoch/iter: 1/399 total_loss: 64.29 loss_cls: 49.86 loss_triplet: 14.4 time: 0.0681 data_time: 0.0007 lr: 9.78e-05 max_mem: 9426M [03/22 09:47:44 fastreid.utils.events]: eta: 0:13:02 epoch/iter: 1/403 total_loss: 64.29 loss_cls: 49.84 loss_triplet: 14.4 time: 0.0681 data_time: 0.0008 lr: 9.85e-05 max_mem: 9426M [03/22 09:47:58 fastreid.utils.events]: eta: 0:12:50 epoch/iter: 2/599 total_loss: 63.29 loss_cls: 48.93 loss_triplet: 14.38 time: 0.0681 data_time: 0.0008 lr: 1.29e-04 max_mem: 9426M [03/22 09:47:58 fastreid.utils.events]: eta: 0:12:48 epoch/iter: 2/605 total_loss: 63.28 loss_cls: 48.91 loss_triplet: 14.42 time: 0.0681 data_time: 0.0007 lr: 1.30e-04 max_mem: 9426M [03/22 09:48:12 fastreid.utils.events]: eta: 0:12:38 epoch/iter: 3/799 total_loss: 61.76 loss_cls: 47.67 loss_triplet: 14.23 time: 0.0683 data_time: 0.0007 lr: 1.61e-04 max_mem: 9426M [03/22 09:48:12 fastreid.utils.events]: eta: 0:12:38 epoch/iter: 3/807 total_loss: 61.73 loss_cls: 47.65 loss_triplet: 14.23 time: 0.0684 data_time: 0.0008 lr: 1.62e-04 max_mem: 9426M [03/22 09:48:25 fastreid.utils.events]: eta: 0:12:21 epoch/iter: 4/999 total_loss: 60.35 loss_cls: 46.21 loss_triplet: 14 time: 0.0682 data_time: 0.0009 lr: 1.92e-04 max_mem: 9426M [03/22 09:48:25 fastreid.engine.hooks]: Open layer group "backbone" training

sijun-zhou commented 3 years ago

inferece accuracy is also far lower than the accuracy posted in the model zone. Any one can help me solve this out? Thanks in advance!

(fastreid) root@sj_docker1_117:/home/wesine/data_8tb_3/sj/work/reid/fast-reid $ cd /home/wesine/data_8tb_3/sj/work/reid/fast-reid ; env PYTHONIOENCODING=UTF-8 PYTHONUNBUFFERED=1 /root/anaconda3/envs/fastreid/bin/python /root/.vscode-server/extensions/ms-python.python-2020.2.64397/pythonFiles/ptvsd_launcher.py --default --nodebug --client --host localhost --port 41755 /home/wesine/data_8tb_3/sj/work/reid/fast-reid/tools/train_net.py --config-file ./configs/Market1501/sbs_R50-ibn.yml --eval-only MODEL.WEIGHTS logs/market1501/sbs_R50-ibn/model_best.pth MODEL.DEVICE cuda:0 Command Line Args: Namespace(config_file='./configs/Market1501/sbs_R50-ibn.yml', dist_url='tcp://127.0.0.1:49152', eval_only=True, machine_rank=0, num_gpus=1, num_machines=1, opts=['MODEL.WEIGHTS', 'logs/market1501/sbs_R50-ibn/model_best.pth', 'MODEL.DEVICE', 'cuda:0'], resume=False) [03/22 11:49:03 fastreid]: Rank of current process: 0. World size: 1 [03/22 11:49:04 fastreid]: Environment info:

sys.platform linux Python 3.6.13	packaged by conda-forge	(default, Feb 19 2021, 05:36:01) [GCC 9.3.0] numpy 1.19.5 fastreid 1.0.0 @/home/wesine/data_8tb_3/sj/work/reid/fast-reid/fastreid FASTREID_ENV_MODULE PyTorch 1.2.0 @/root/anaconda3/envs/fastreid/lib/python3.6/site-packages/torch PyTorch debug build False GPU available True GPU 0,1 GeForce RTX 2080 Ti CUDA_HOME /usr/local/cuda Pillow 8.1.2 torchvision 0.4.0 @/root/anaconda3/envs/fastreid/lib/python3.6/site-packages/torchvision torchvision arch flags sm_35, sm_50, sm_60, sm_70, sm_75 cv2 4.5.1

PyTorch built with:

GCC 7.3
Intel(R) Math Kernel Library Version 2019.0.4 Product Build 20190411 for Intel(R) 64 architecture applications
Intel(R) MKL-DNN v0.18.1 (Git Hash 7de7e5d02bf687f971e7668963649728356e0c20)
OpenMP 201511 (a.k.a. OpenMP 4.5)
NNPACK is enabled
CUDA Runtime 10.0
NVCC architecture flags: -gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_50,code=compute_50
CuDNN 7.6.2
Magma 2.5.1
Build settings: BLAS=MKL, BUILD_NAMEDTENSOR=OFF, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -fopenmp -DUSE_FBGEMM -DUSE_QNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Wno-stringop-overflow, DISABLE_NUMA=1, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=True, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,

[03/22 11:49:04 fastreid]: Command line arguments: Namespace(config_file='./configs/Market1501/sbs_R50-ibn.yml', dist_url='tcp://127.0.0.1:49152', eval_only=True, machine_rank=0, num_gpus=1, num_machines=1, opts=['MODEL.WEIGHTS', 'logs/market1501/sbs_R50-ibn/model_best.pth', 'MODEL.DEVICE', 'cuda:0'], resume=False) [03/22 11:49:04 fastreid]: Contents of args.config_file=./configs/Market1501/sbs_R50-ibn.yml: BASE: ../Base-SBS.yml

MODEL: BACKBONE: WITH_IBN: True

DATASETS: NAMES: ("Market1501",) TESTS: ("Market1501",)

OUTPUT_DIR: logs/market1501/sbs_R50-ibn

[03/22 11:49:04 fastreid]: Running with full config: CUDNN_BENCHMARK: True DATALOADER: NAIVE_WAY: True NUM_INSTANCE: 16 NUM_WORKERS: 8 PK_SAMPLER: True DATASETS: COMBINEALL: False NAMES: ('Market1501',) TESTS: ('Market1501',) INPUT: AUGMIX_PROB: 0.0 AUTOAUG_PROB: 0.1 CJ: BRIGHTNESS: 0.15 CONTRAST: 0.15 ENABLED: False HUE: 0.1 PROB: 0.5 SATURATION: 0.1 DO_AFFINE: False DO_AUGMIX: False DO_AUTOAUG: True DO_FLIP: True DO_PAD: True FLIP_PROB: 0.5 PADDING: 10 PADDING_MODE: constant REA: ENABLED: True PROB: 0.5 VALUE: [123.675, 116.28, 103.53] RPT: ENABLED: False PROB: 0.5 SIZE_TEST: [384, 128] SIZE_TRAIN: [384, 128] KD: MODEL_CONFIG: [''] MODEL_WEIGHTS: [''] MODEL: BACKBONE: DEPTH: 50x FEAT_DIM: 2048 LAST_STRIDE: 1 NAME: build_resnet_backbone NORM: BN PRETRAIN: True PRETRAIN_PATH: WITH_IBN: True WITH_NL: True WITH_SE: False DEVICE: cuda:0 FREEZE_LAYERS: ['backbone'] HEADS: CLS_LAYER: circleSoftmax EMBEDDING_DIM: 0 MARGIN: 0.35 NAME: EmbeddingHead NECK_FEAT: after NORM: BN NUM_CLASSES: 0 POOL_LAYER: gempoolP SCALE: 64 WITH_BNNECK: True LOSSES: CE: ALPHA: 0.2 EPSILON: 0.1 SCALE: 1.0 CIRCLE: GAMMA: 128 MARGIN: 0.25 SCALE: 1.0 COSFACE: GAMMA: 128 MARGIN: 0.25 SCALE: 1.0 FL: ALPHA: 0.25 GAMMA: 2 SCALE: 1.0 NAME: ('CrossEntropyLoss', 'TripletLoss') TRI: HARD_MINING: True MARGIN: 0.0 NORM_FEAT: False SCALE: 1.0 META_ARCHITECTURE: Baseline PIXEL_MEAN: [123.675, 116.28, 103.53] PIXEL_STD: [58.395, 57.120000000000005, 57.375] QUEUE_SIZE: 8192 WEIGHTS: logs/market1501/sbs_R50-ibn/model_best.pth OUTPUT_DIR: logs/market1501/sbs_R50-ibn SOLVER: BASE_LR: 0.00035 BIAS_LR_FACTOR: 1.0 CHECKPOINT_PERIOD: 20 DELAY_EPOCHS: 30 ETA_MIN_LR: 7e-07 FP16_ENABLED: False FREEZE_FC_ITERS: 0 FREEZE_ITERS: 1000 GAMMA: 0.1 HEADS_LR_FACTOR: 1.0 IMS_PER_BATCH: 64 MAX_EPOCH: 60 MOMENTUM: 0.9 NESTEROV: True OPT: Adam SCHED: CosineAnnealingLR STEPS: [40, 90] WARMUP_FACTOR: 0.1 WARMUP_ITERS: 2000 WARMUP_METHOD: linear WEIGHT_DECAY: 0.0005 WEIGHT_DECAY_BIAS: 0.0005 TEST: AQE: ALPHA: 3.0 ENABLED: False QE_K: 5 QE_TIME: 1 EVAL_PERIOD: 10 FLIP_ENABLED: False IMS_PER_BATCH: 128 METRIC: cosine PRECISE_BN: DATASET: Market1501 ENABLED: False NUM_ITER: 300 RERANK: ENABLED: False K1: 20 K2: 6 LAMBDA: 0.3 ROC_ENABLED: False [03/22 11:49:04 fastreid]: Full config saved to /home/wesine/data_8tb_3/sj/work/reid/fast-reid/logs/market1501/sbs_R50-ibn/config.yaml [03/22 11:49:04 fastreid.utils.env]: Using a generated random seed 4471883

Baseline( (backbone): ResNet( (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False) (bn1): BatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=True) (layer1): Sequential( (0): Bottleneck( (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): IBN( (IN): InstanceNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (BN): BatchNorm(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (se): Identity() (downsample): Sequential( (0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (1): BatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (1): Bottleneck( (conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): IBN( (IN): InstanceNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (BN): BatchNorm(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (se): Identity() ) (2): Bottleneck( (conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): IBN( (IN): InstanceNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (BN): BatchNorm(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (se): Identity() ) ) (layer2): Sequential( (0): Bottleneck( (conv1): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): IBN( (IN): InstanceNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (BN): BatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False) (bn2): BatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (se): Identity() (downsample): Sequential( (0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False) (1): BatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (1): Bottleneck( (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): IBN( (IN): InstanceNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (BN): BatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (se): Identity() ) (2): Bottleneck( (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): IBN( (IN): InstanceNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (BN): BatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (se): Identity() ) (3): Bottleneck( (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): IBN( (IN): InstanceNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (BN): BatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (se): Identity() ) ) (layer3): Sequential( (0): Bottleneck( (conv1): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): IBN( (IN): InstanceNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (BN): BatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False) (bn2): BatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (se): Identity() (downsample): Sequential( (0): Conv2d(512, 1024, kernel_size=(1, 1), stride=(2, 2), bias=False) (1): BatchNorm(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (1): Bottleneck( (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): IBN( (IN): InstanceNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (BN): BatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (se): Identity() ) (2): Bottleneck( (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): IBN( (IN): InstanceNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (BN): BatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (se): Identity() ) (3): Bottleneck( (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): IBN( (IN): InstanceNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (BN): BatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (se): Identity() ) (4): Bottleneck( (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): IBN( (IN): InstanceNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (BN): BatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (se): Identity() ) (5): Bottleneck( (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): IBN( (IN): InstanceNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (BN): BatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (se): Identity() ) ) (layer4): Sequential( (0): Bottleneck( (conv1): Conv2d(1024, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (se): Identity() (downsample): Sequential( (0): Conv2d(1024, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False) (1): BatchNorm(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (1): Bottleneck( (conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (se): Identity() ) (2): Bottleneck( (conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (se): Identity() ) ) (NL_1): ModuleList() (NL_2): ModuleList( (0): Non_local( (g): Conv2d(512, 1, kernel_size=(1, 1), stride=(1, 1)) (W): Sequential( (0): Conv2d(1, 512, kernel_size=(1, 1), stride=(1, 1)) (1): BatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (theta): Conv2d(512, 1, kernel_size=(1, 1), stride=(1, 1)) (phi): Conv2d(512, 1, kernel_size=(1, 1), stride=(1, 1)) ) (1): Non_local( (g): Conv2d(512, 1, kernel_size=(1, 1), stride=(1, 1)) (W): Sequential( (0): Conv2d(1, 512, kernel_size=(1, 1), stride=(1, 1)) (1): BatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (theta): Conv2d(512, 1, kernel_size=(1, 1), stride=(1, 1)) (phi): Conv2d(512, 1, kernel_size=(1, 1), stride=(1, 1)) ) ) (NL_3): ModuleList( (0): Non_local( (g): Conv2d(1024, 1, kernel_size=(1, 1), stride=(1, 1)) (W): Sequential( (0): Conv2d(1, 1024, kernel_size=(1, 1), stride=(1, 1)) (1): BatchNorm(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (theta): Conv2d(1024, 1, kernel_size=(1, 1), stride=(1, 1)) (phi): Conv2d(1024, 1, kernel_size=(1, 1), stride=(1, 1)) ) (1): Non_local( (g): Conv2d(1024, 1, kernel_size=(1, 1), stride=(1, 1)) (W): Sequential( (0): Conv2d(1, 1024, kernel_size=(1, 1), stride=(1, 1)) (1): BatchNorm(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (theta): Conv2d(1024, 1, kernel_size=(1, 1), stride=(1, 1)) (phi): Conv2d(1024, 1, kernel_size=(1, 1), stride=(1, 1)) ) (2): Non_local( (g): Conv2d(1024, 1, kernel_size=(1, 1), stride=(1, 1)) (W): Sequential( (0): Conv2d(1, 1024, kernel_size=(1, 1), stride=(1, 1)) (1): BatchNorm(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (theta): Conv2d(1024, 1, kernel_size=(1, 1), stride=(1, 1)) (phi): Conv2d(1024, 1, kernel_size=(1, 1), stride=(1, 1)) ) ) (NL_4): ModuleList() ) (heads): EmbeddingHead( (pool_layer): GeneralizedMeanPoolingP(Parameter containing: tensor([3.], device='cuda:0', requires_grad=True), output_size=1) (bottleneck): Sequential( (0): BatchNorm(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (classifier): CircleSoftmax(in_features=2048, num_classes=0, scale=64, margin=0.35) ) ) [03/22 11:49:08 fastreid.utils.checkpoint]: Loading checkpoint from logs/market1501/sbs_R50-ibn/model_best.pth WARNING [03/22 11:49:09 fastreid.utils.checkpoint]: Skip loading parameter 'heads.classifier.weight' to the model due to incompatible shapes: (751, 2048) in the checkpoint but (0, 2048) in the model! You might want to double check if this is expected. [03/22 11:49:09 fastreid.utils.checkpoint]: Some model parameters or buffers are not found in the checkpoint: heads.classifier.weight [03/22 11:49:09 fastreid.engine.defaults]: Prepare testing set [03/22 11:49:09 fastreid.data.datasets.bases]: => Loaded Market1501 in csv format: subset # ids # images # cameras	:---------	:--------	:-----------	:------------		query	750	3368	6		gallery	751	15913	6	[03/22 11:49:09 fastreid.evaluation.evaluator]: Start inference on 19281 images [03/22 11:49:12 fastreid.evaluation.evaluator]: Inference done 11/151. 0.2056 s / batch. ETA=0:00:28 [03/22 11:49:41 fastreid.evaluation.evaluator]: Total inference time: 0:00:30.367810 (0.207999 s / batch per device, on 1 devices) [03/22 11:49:41 fastreid.evaluation.evaluator]: Total inference pure compute time: 0:00:30 (0.205810 s / batch per device, on 1 devices) [03/22 11:49:47 fastreid.engine.defaults]: Evaluation results for Market1501 in csv format: [03/22 11:49:47 fastreid.evaluation.testing]: Evaluation results in csv format:	Dataset	Rank-1	Rank-5	Rank-10	mAP	mINP	metric
Market1501	93.94	97.57	98.28	81.89	48.57	87.91

sijun-zhou commented 3 years ago

@L1aoXingyu hi, L1aoXingyu. If you have time, could u plz have a look for this problems? Thanks in advance!

gmt710 commented 3 years ago

hello, have you loaded pretrained model successfully ? you can join wechat group. https://github.com/JDAI-CV/fast-reid/issues/354

L1aoXingyu commented 3 years ago

@sijun-zhou You can firstly try to use 1 GPU to reproduce the results in the model zoo. If you use 2 GPUs, you need to tune batch size twice.

sijun-zhou commented 3 years ago

hello, have you loaded pretrained model successfully ? you can join wechat group.

354

hi gmt710 , you can have a look for my train log pasted above. It shows that the training using pretrain model. "[03/22 09:47:04 fastreid.modeling.backbones.resnet]: Loading pretrained model from /root/.cache/torch/checkpoints/resnet50_ibn_a-d9d0bb7b.pth".

And i pasted the snippet of the above log here. You can have a check, including missing keys and keys that not used:

###################################################### [03/22 09:47:04 fastreid.modeling.backbones.resnet]: Loading pretrained model from /root/.cache/torch/checkpoints/resnet50_ibn_a-d9d0bb7b.pth [03/22 09:47:04 fastreid.modeling.backbones.resnet]: Some model parameters or buffers are not found in the checkpoint: NL_2.0.g.{weight, bias} NL_2.0.W.0.{weight, bias} NL_2.0.W.1.{weight, bias, running_mean, running_var} NL_2.0.theta.{weight, bias} NL_2.0.phi.{weight, bias} NL_2.1.g.{weight, bias} NL_2.1.W.0.{weight, bias} NL_2.1.W.1.{weight, bias, running_mean, running_var} NL_2.1.theta.{weight, bias} NL_2.1.phi.{weight, bias} NL_3.0.g.{weight, bias} NL_3.0.W.0.{weight, bias} NL_3.0.W.1.{weight, bias, running_mean, running_var} NL_3.0.theta.{weight, bias} NL_3.0.phi.{weight, bias} NL_3.1.g.{weight, bias} NL_3.1.W.0.{weight, bias} NL_3.1.W.1.{weight, bias, running_mean, running_var} NL_3.1.theta.{weight, bias} NL_3.1.phi.{weight, bias} NL_3.2.g.{weight, bias} NL_3.2.W.0.{weight, bias} NL_3.2.W.1.{weight, bias, running_mean, running_var} NL_3.2.theta.{weight, bias} NL_3.2.phi.{weight, bias} [03/22 09:47:04 fastreid.modeling.backbones.resnet]: The checkpoint state_dict contains keys that are not used by the model: fc.{weight, bias} ######################################################

sijun-zhou commented 3 years ago

@sijun-zhou You can firstly try to use 1 GPU to reproduce the results in the model zoo. If you use 2 GPUs, you need to tune batch size twice.

@L1aoXingyu Hi, L1aoXingyu, I have tested with 1 GPU, which got nearly the same result as you posted in the model zone. Thank you very much!

BTW. I don't quite understand what does "you need to tune batch size twice" mean, if I want to use 2 GPUs. Could you plz give me a more specific guidelines or description? Thanks a lot!

L1aoXingyu commented 3 years ago

It means if you want to train a model with 2 GPUs, you need to tune the batch size from 64 to 128.

sky186 commented 3 years ago

@L1aoXingyu 最新代码训练多卡训练测试问题 1、2卡训练，batch to 256,训练没有问题，但是测试的时候，返回的结果是空的，单卡测试正常， 2、超参数问题 Freeze 和 warmup 是迭代数?，根据自己的数据量和batch 计算出 iter, 是不是通常计算到10个epoch的迭代数, 因为超参数的其他好像是 epoch 数量，就这两个参数好像是迭代数，有歧义，可以说明一下？

L1aoXingyu commented 3 years ago

@sky186

这个我明天测试一下；
freeze 和 warmup 按 iter 设置更加合理，在一些比较大数据集的训练 setting 里面，比如 face recognition，总的 epoch 就跑 16次，所以不可能按照 epoch 去设置 warmup，可能设置的 warmup 次数小于 1 个 epoch，所以更加合理的方式是直接设置一个 training iter，配置文件里面也很清楚是 WARMUP_ITER 和 MAX_EPOCH

sky186 commented 3 years ago

@L1aoXingyu 您好，请问最新的代码数据处理到提取特征部分和之前有哪里不同吗？因为之前的版本抽取了一个提取特征的代码接口，正确，测试结果正确这里我换成最新版本训练的模型和config 提取特征后，测试结果完全不正确，其他设置都是一致的，代码有点多不知道怎么找那些可能修改。

L1aoXingyu commented 3 years ago

@sky186 是不是 model 没有 load 进去呢？另外我测试了一下，多卡测试是可以跑的，多卡测试时，只会在主进程返回结果

sky186 commented 3 years ago

@L1aoXingyu 1、嗯是的，经检查，模型参数的加载这边没有真的加载成功，做了修改，现在好了，超级感谢~ 2、谢谢您的回复，多卡的时候测试结果返回空，在 defaults.py/ def test(cls,cfg,model,evaluators=None ) 这里有个测试的results , 多卡的时候这里返回是空的。您说的主进程返回结果，大概是在哪里尼

L1aoXingyu commented 3 years ago

@sky186 你从哪里拿的测试返回结果？ https://github.com/JDAI-CV/fast-reid/blob/25cfa88fd97fbef55abcdd1bf69f2db822306bff/fastreid/evaluation/reid_evaluation.py#L55 这里的代码表示非主进程，返回空的 {}

JDAI-CV / fast-reid

unable to reproduce results of Market1501 based on SBS(R50-ibn) results #439

354