JDAI-CV / fast-reid

SOTA Re-identification Methods and Toolbox
Apache License 2.0
3.43k stars 838 forks source link

unable to reproduce results of Market1501 based on SBS(R50-ibn) results #439

Closed sijun-zhou closed 3 years ago

sijun-zhou commented 3 years ago

my environment: python3.6 pytorch 1.2.0 cuda 10.0.130 apex 0.1 GPU 2*2080TI

I train it with 2 2080ti gpu card on market1501 dataset with all default settings of sbs_R50-ibn.yml but i cannot reproduce the results My highest results is as follows for highest top1(92.64%) and map(78.78%) respectively, which is far less then the model zone 95.7%(top1) and 89.3%(map):

##################################### top1 ################################################ [03/22 10:37:55 fastreid.utils.events]: eta: 0:00:27 epoch/iter: 59/11999 total_loss: 12.79 loss_cls: 12.79 loss_triplet: 9.157e-05 time: 0.2132 data_time: 0.0007 lr: 7.00e-07 max_mem: 9573M [03/22 10:38:22 fastreid.utils.events]: eta: 0:00:00 epoch/iter: 59/12119 total_loss: 12.79 loss_cls: 12.79 loss_triplet: 7.518e-05 time: 0.2134 data_time: 0.0009 lr: 7.00e-07 max_mem: 9573M [03/22 10:38:23 fastreid.engine.defaults]: Prepare testing set [03/22 10:38:23 fastreid.data.datasets.bases]: => Loaded Market1501 in csv format: subset # ids # images # cameras :--------- :-------- :----------- :------------ query 750 3368 6 gallery 751 15913 6 [03/22 10:38:23 fastreid.evaluation.evaluator]: Start inference on 19281 images [03/22 10:38:30 fastreid.evaluation.evaluator]: Inference done 11/151. 0.1033 s / batch. ETA=0:00:14 [03/22 10:38:45 fastreid.evaluation.evaluator]: Total inference time: 0:00:15.542858 (0.106458 s / batch per device, on 2 devices) [03/22 10:38:45 fastreid.evaluation.evaluator]: Total inference pure compute time: 0:00:15 (0.103480 s / batch per device, on 2 devices) [03/22 10:40:17 fastreid.engine.defaults]: Evaluation results for Market1501 in csv format: [03/22 10:40:17 fastreid.evaluation.testing]: Evaluation results in csv format: Dataset Rank-1 Rank-5 Rank-10 mAP mINP metric
Market1501 92.64 97.06 98.28 78.01 42.65 85.32

###########################################################################################

##################################### map ################################################ [03/22 10:27:15 fastreid.utils.events]: eta: 0:08:44 epoch/iter: 48/9799 total_loss: 14.32 loss_cls: 14.32 loss_triplet: 0.001028 time: 0.2104 data_time: 0.0009 lr: 1.04e-04 max_mem: 9573M [03/22 10:27:38 fastreid.utils.events]: eta: 0:08:22 epoch/iter: 48/9897 total_loss: 14.46 loss_cls: 14.45 loss_triplet: 0.0009101 time: 0.2105 data_time: 0.0006 lr: 1.04e-04 max_mem: 9573M [03/22 10:28:01 fastreid.utils.events]: eta: 0:07:59 epoch/iter: 49/9999 total_loss: 14.38 loss_cls: 14.38 loss_triplet: 0.001046 time: 0.2107 data_time: 0.0009 lr: 8.80e-05 max_mem: 9573M [03/22 10:28:24 fastreid.engine.defaults]: Prepare testing set [03/22 10:28:24 fastreid.data.datasets.bases]: => Loaded Market1501 in csv format: subset # ids # images # cameras :--------- :-------- :----------- :------------ query 750 3368 6 gallery 751 15913 6 [03/22 10:28:24 fastreid.evaluation.evaluator]: Start inference on 19281 images [03/22 10:28:32 fastreid.evaluation.evaluator]: Inference done 11/151. 0.1015 s / batch. ETA=0:00:14 [03/22 10:28:47 fastreid.evaluation.evaluator]: Total inference time: 0:00:15.644438 (0.107154 s / batch per device, on 2 devices) [03/22 10:28:47 fastreid.evaluation.evaluator]: Total inference pure compute time: 0:00:15 (0.104161 s / batch per device, on 2 devices) [03/22 10:30:45 fastreid.engine.defaults]: Evaluation results for Market1501 in csv format: [03/22 10:30:45 fastreid.evaluation.testing]: Evaluation results in csv format: Dataset Rank-1 Rank-5 Rank-10 mAP mINP metric
Market1501 92.49 97.39 98.25 78.78 44.38 85.64

###########################################################################################

sijun-zhou commented 3 years ago

Below is my training configs: (fastreid) root@sj_docker1_117:/home/wesine/data_8tb_3/sj/work/reid/fast-reid $ cd /home/wesine/data_8tb_3/sj/work/reid/fast-reid ; env PYTHONIOENCODING=UTF-8 PYTHONUNBUFFERED=1 /root/anaconda3/envs/fastreid/bin/python /root/.vscode-server/extensions/ms-python.python-2020.2.64397/pythonFiles/ptvsd_launcher.py --default --nodebug --client --host localhost --port 43535 /home/wesine/data_8tb_3/sj/work/reid/fast-reid/tools/train_net.py --config-file ./configs/Market1501/sbs_R50-ibn.yml --num-gpus 2 Command Line Args: Namespace(config_file='./configs/Market1501/sbs_R50-ibn.yml', dist_url='tcp://127.0.0.1:49152', eval_only=False, machine_rank=0, num_gpus=2, num_machines=1, opts=[], resume=False) [03/22 09:47:02 fastreid]: Rank of current process: 0. World size: 2 [03/22 09:47:03 fastreid]: Environment info:


sys.platform linux Python 3.6.13 packaged by conda-forge (default, Feb 19 2021, 05:36:01) [GCC 9.3.0] numpy 1.19.5 fastreid 1.0.0 @/home/wesine/data_8tb_3/sj/work/reid/fast-reid/fastreid FASTREID_ENV_MODULE PyTorch 1.2.0 @/root/anaconda3/envs/fastreid/lib/python3.6/site-packages/torch PyTorch debug build False GPU available True GPU 0,1 GeForce RTX 2080 Ti CUDA_HOME /usr/local/cuda Pillow 8.1.2 torchvision 0.4.0 @/root/anaconda3/envs/fastreid/lib/python3.6/site-packages/torchvision torchvision arch flags sm_35, sm_50, sm_60, sm_70, sm_75 cv2 4.5.1

PyTorch built with:

[03/22 09:47:03 fastreid]: Command line arguments: Namespace(config_file='./configs/Market1501/sbs_R50-ibn.yml', dist_url='tcp://127.0.0.1:49152', eval_only=False, machine_rank=0, num_gpus=2, num_machines=1, opts=[], resume=False) [03/22 09:47:03 fastreid]: Contents of args.config_file=./configs/Market1501/sbs_R50-ibn.yml: BASE: ../Base-SBS.yml

MODEL: BACKBONE: WITH_IBN: True

DATASETS: NAMES: ("Market1501",) TESTS: ("Market1501",)

OUTPUT_DIR: logs/market1501/sbs_R50-ibn

[03/22 09:47:03 fastreid]: Running with full config: CUDNN_BENCHMARK: True DATALOADER: NAIVE_WAY: True NUM_INSTANCE: 16 NUM_WORKERS: 8 PK_SAMPLER: True DATASETS: COMBINEALL: False NAMES: ('Market1501',) TESTS: ('Market1501',) INPUT: AUGMIX_PROB: 0.0 AUTOAUG_PROB: 0.1 CJ: BRIGHTNESS: 0.15 CONTRAST: 0.15 ENABLED: False HUE: 0.1 PROB: 0.5 SATURATION: 0.1 DO_AFFINE: False DO_AUGMIX: False DO_AUTOAUG: True DO_FLIP: True DO_PAD: True FLIP_PROB: 0.5 PADDING: 10 PADDING_MODE: constant REA: ENABLED: True PROB: 0.5 VALUE: [123.675, 116.28, 103.53] RPT: ENABLED: False PROB: 0.5 SIZE_TEST: [384, 128] SIZE_TRAIN: [384, 128] KD: MODEL_CONFIG: [''] MODEL_WEIGHTS: [''] MODEL: BACKBONE: DEPTH: 50x FEAT_DIM: 2048 LAST_STRIDE: 1 NAME: build_resnet_backbone NORM: BN PRETRAIN: True PRETRAIN_PATH: WITH_IBN: True WITH_NL: True WITH_SE: False DEVICE: cuda FREEZE_LAYERS: ['backbone'] HEADS: CLS_LAYER: circleSoftmax EMBEDDING_DIM: 0 MARGIN: 0.35 NAME: EmbeddingHead NECK_FEAT: after NORM: BN NUM_CLASSES: 0 POOL_LAYER: gempoolP SCALE: 64 WITH_BNNECK: True LOSSES: CE: ALPHA: 0.2 EPSILON: 0.1 SCALE: 1.0 CIRCLE: GAMMA: 128 MARGIN: 0.25 SCALE: 1.0 COSFACE: GAMMA: 128 MARGIN: 0.25 SCALE: 1.0 FL: ALPHA: 0.25 GAMMA: 2 SCALE: 1.0 NAME: ('CrossEntropyLoss', 'TripletLoss') TRI: HARD_MINING: True MARGIN: 0.0 NORM_FEAT: False SCALE: 1.0 META_ARCHITECTURE: Baseline PIXEL_MEAN: [123.675, 116.28, 103.53] PIXEL_STD: [58.395, 57.120000000000005, 57.375] QUEUE_SIZE: 8192 WEIGHTS: OUTPUT_DIR: logs/market1501/sbs_R50-ibn SOLVER: BASE_LR: 0.00035 BIAS_LR_FACTOR: 1.0 CHECKPOINT_PERIOD: 20 DELAY_EPOCHS: 30 ETA_MIN_LR: 7e-07 FP16_ENABLED: False FREEZE_FC_ITERS: 0 FREEZE_ITERS: 1000 GAMMA: 0.1 HEADS_LR_FACTOR: 1.0 IMS_PER_BATCH: 64 MAX_EPOCH: 60 MOMENTUM: 0.9 NESTEROV: True OPT: Adam SCHED: CosineAnnealingLR STEPS: [40, 90] WARMUP_FACTOR: 0.1 WARMUP_ITERS: 2000 WARMUP_METHOD: linear WEIGHT_DECAY: 0.0005 WEIGHT_DECAY_BIAS: 0.0005 TEST: AQE: ALPHA: 3.0 ENABLED: False QE_K: 5 QE_TIME: 1 EVAL_PERIOD: 10 FLIP_ENABLED: False IMS_PER_BATCH: 128 METRIC: cosine PRECISE_BN: DATASET: Market1501 ENABLED: False NUM_ITER: 300 RERANK: ENABLED: False K1: 20 K2: 6 LAMBDA: 0.3 ROC_ENABLED: False [03/22 09:47:03 fastreid]: Full config saved to /home/wesine/data_8tb_3/sj/work/reid/fast-reid/logs/market1501/sbs_R50-ibn/config.yaml [03/22 09:47:03 fastreid.utils.env]: Using a generated random seed 3342157 [03/22 09:47:03 fastreid.engine.defaults]: Prepare training set [03/22 09:47:03 fastreid.data.datasets.bases]: => Loaded Market1501 in csv format: subset # ids # images # cameras
|:---------|:--------|:-----------|:------------| | train | 751 | 12936 | 6 | [03/22 09:47:03 fastreid.engine.defaults]: Auto-scaling the num_classes=751 [03/22 09:47:04 fastreid.modeling.backbones.resnet]: Loading pretrained model from /root/.cache/torch/checkpoints/resnet50_ibn_a-d9d0bb7b.pth [03/22 09:47:04 fastreid.modeling.backbones.resnet]: Some model parameters or buffers are not found in the checkpoint: NL_2.0.g.{weight, bias} NL_2.0.W.0.{weight, bias} NL_2.0.W.1.{weight, bias, running_mean, running_var} NL_2.0.theta.{weight, bias} NL_2.0.phi.{weight, bias} NL_2.1.g.{weight, bias} NL_2.1.W.0.{weight, bias} NL_2.1.W.1.{weight, bias, running_mean, running_var} NL_2.1.theta.{weight, bias} NL_2.1.phi.{weight, bias} NL_3.0.g.{weight, bias} NL_3.0.W.0.{weight, bias} NL_3.0.W.1.{weight, bias, running_mean, running_var} NL_3.0.theta.{weight, bias} NL_3.0.phi.{weight, bias} NL_3.1.g.{weight, bias} NL_3.1.W.0.{weight, bias} NL_3.1.W.1.{weight, bias, running_mean, running_var} NL_3.1.theta.{weight, bias} NL_3.1.phi.{weight, bias} NL_3.2.g.{weight, bias} NL_3.2.W.0.{weight, bias} NL_3.2.W.1.{weight, bias, running_mean, running_var} NL_3.2.theta.{weight, bias} NL_3.2.phi.{weight, bias} [03/22 09:47:04 fastreid.modeling.backbones.resnet]: The checkpoint state_dict contains keys that are not used by the model: fc.{weight, bias}

Baseline( (backbone): ResNet( (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False) (bn1): BatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=True) (layer1): Sequential( (0): Bottleneck( (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): IBN( (IN): InstanceNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (BN): BatchNorm(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (se): Identity() (downsample): Sequential( (0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (1): BatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (1): Bottleneck( (conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): IBN( (IN): InstanceNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (BN): BatchNorm(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (se): Identity() ) (2): Bottleneck( (conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): IBN( (IN): InstanceNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (BN): BatchNorm(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (se): Identity() ) ) (layer2): Sequential( (0): Bottleneck( (conv1): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): IBN( (IN): InstanceNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (BN): BatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False) (bn2): BatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (se): Identity() (downsample): Sequential( (0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False) (1): BatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (1): Bottleneck( (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): IBN( (IN): InstanceNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (BN): BatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (se): Identity() ) (2): Bottleneck( (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): IBN( (IN): InstanceNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (BN): BatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (se): Identity() ) (3): Bottleneck( (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): IBN( (IN): InstanceNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (BN): BatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (se): Identity() ) ) (layer3): Sequential( (0): Bottleneck( (conv1): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): IBN( (IN): InstanceNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (BN): BatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False) (bn2): BatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (se): Identity() (downsample): Sequential( (0): Conv2d(512, 1024, kernel_size=(1, 1), stride=(2, 2), bias=False) (1): BatchNorm(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (1): Bottleneck( (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): IBN( (IN): InstanceNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (BN): BatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (se): Identity() ) (2): Bottleneck( (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): IBN( (IN): InstanceNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (BN): BatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (se): Identity() ) (3): Bottleneck( (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): IBN( (IN): InstanceNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (BN): BatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (se): Identity() ) (4): Bottleneck( (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): IBN( (IN): InstanceNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (BN): BatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (se): Identity() ) (5): Bottleneck( (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): IBN( (IN): InstanceNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (BN): BatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (se): Identity() ) ) (layer4): Sequential( (0): Bottleneck( (conv1): Conv2d(1024, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (se): Identity() (downsample): Sequential( (0): Conv2d(1024, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False) (1): BatchNorm(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (1): Bottleneck( (conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (se): Identity() ) (2): Bottleneck( (conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (se): Identity() ) ) (NL_1): ModuleList() (NL_2): ModuleList( (0): Non_local( (g): Conv2d(512, 1, kernel_size=(1, 1), stride=(1, 1)) (W): Sequential( (0): Conv2d(1, 512, kernel_size=(1, 1), stride=(1, 1)) (1): BatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (theta): Conv2d(512, 1, kernel_size=(1, 1), stride=(1, 1)) (phi): Conv2d(512, 1, kernel_size=(1, 1), stride=(1, 1)) ) (1): Non_local( (g): Conv2d(512, 1, kernel_size=(1, 1), stride=(1, 1)) (W): Sequential( (0): Conv2d(1, 512, kernel_size=(1, 1), stride=(1, 1)) (1): BatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (theta): Conv2d(512, 1, kernel_size=(1, 1), stride=(1, 1)) (phi): Conv2d(512, 1, kernel_size=(1, 1), stride=(1, 1)) ) ) (NL_3): ModuleList( (0): Non_local( (g): Conv2d(1024, 1, kernel_size=(1, 1), stride=(1, 1)) (W): Sequential( (0): Conv2d(1, 1024, kernel_size=(1, 1), stride=(1, 1)) (1): BatchNorm(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (theta): Conv2d(1024, 1, kernel_size=(1, 1), stride=(1, 1)) (phi): Conv2d(1024, 1, kernel_size=(1, 1), stride=(1, 1)) ) (1): Non_local( (g): Conv2d(1024, 1, kernel_size=(1, 1), stride=(1, 1)) (W): Sequential( (0): Conv2d(1, 1024, kernel_size=(1, 1), stride=(1, 1)) (1): BatchNorm(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (theta): Conv2d(1024, 1, kernel_size=(1, 1), stride=(1, 1)) (phi): Conv2d(1024, 1, kernel_size=(1, 1), stride=(1, 1)) ) (2): Non_local( (g): Conv2d(1024, 1, kernel_size=(1, 1), stride=(1, 1)) (W): Sequential( (0): Conv2d(1, 1024, kernel_size=(1, 1), stride=(1, 1)) (1): BatchNorm(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (theta): Conv2d(1024, 1, kernel_size=(1, 1), stride=(1, 1)) (phi): Conv2d(1024, 1, kernel_size=(1, 1), stride=(1, 1)) ) ) (NL_4): ModuleList() ) (heads): EmbeddingHead( (pool_layer): GeneralizedMeanPoolingP(Parameter containing: tensor([3.], device='cuda:0', requires_grad=True), output_size=1) (bottleneck): Sequential( (0): BatchNorm(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (classifier): CircleSoftmax(in_features=2048, num_classes=751, scale=64, margin=0.35) ) ) [03/22 09:47:14 fastreid.utils.checkpoint]: No checkpoint found. Training model from scratch [03/22 09:47:14 fastreid.engine.train_loop]: Starting training from epoch 0 [03/22 09:47:14 fastreid.engine.hooks]: Freeze layer group "backbone" training for 1000 iterations [03/22 09:47:30 fastreid.utils.events]: eta: 0:12:59 epoch/iter: 0/199 total_loss: 64.82 loss_cls: 50.34 loss_triplet: 14.5 time: 0.0668 data_time: 0.0010 lr: 6.63e-05 max_mem: 9426M [03/22 09:47:30 fastreid.utils.events]: eta: 0:12:57 epoch/iter: 0/201 total_loss: 64.8 loss_cls: 50.34 loss_triplet: 14.47 time: 0.0667 data_time: 0.0011 lr: 6.67e-05 max_mem: 9426M [03/22 09:47:44 fastreid.utils.events]: eta: 0:13:04 epoch/iter: 1/399 total_loss: 64.29 loss_cls: 49.86 loss_triplet: 14.4 time: 0.0681 data_time: 0.0007 lr: 9.78e-05 max_mem: 9426M [03/22 09:47:44 fastreid.utils.events]: eta: 0:13:02 epoch/iter: 1/403 total_loss: 64.29 loss_cls: 49.84 loss_triplet: 14.4 time: 0.0681 data_time: 0.0008 lr: 9.85e-05 max_mem: 9426M [03/22 09:47:58 fastreid.utils.events]: eta: 0:12:50 epoch/iter: 2/599 total_loss: 63.29 loss_cls: 48.93 loss_triplet: 14.38 time: 0.0681 data_time: 0.0008 lr: 1.29e-04 max_mem: 9426M [03/22 09:47:58 fastreid.utils.events]: eta: 0:12:48 epoch/iter: 2/605 total_loss: 63.28 loss_cls: 48.91 loss_triplet: 14.42 time: 0.0681 data_time: 0.0007 lr: 1.30e-04 max_mem: 9426M [03/22 09:48:12 fastreid.utils.events]: eta: 0:12:38 epoch/iter: 3/799 total_loss: 61.76 loss_cls: 47.67 loss_triplet: 14.23 time: 0.0683 data_time: 0.0007 lr: 1.61e-04 max_mem: 9426M [03/22 09:48:12 fastreid.utils.events]: eta: 0:12:38 epoch/iter: 3/807 total_loss: 61.73 loss_cls: 47.65 loss_triplet: 14.23 time: 0.0684 data_time: 0.0008 lr: 1.62e-04 max_mem: 9426M [03/22 09:48:25 fastreid.utils.events]: eta: 0:12:21 epoch/iter: 4/999 total_loss: 60.35 loss_cls: 46.21 loss_triplet: 14 time: 0.0682 data_time: 0.0009 lr: 1.92e-04 max_mem: 9426M [03/22 09:48:25 fastreid.engine.hooks]: Open layer group "backbone" training

sijun-zhou commented 3 years ago

inferece accuracy is also far lower than the accuracy posted in the model zone. Any one can help me solve this out? Thanks in advance!

(fastreid) root@sj_docker1_117:/home/wesine/data_8tb_3/sj/work/reid/fast-reid $ cd /home/wesine/data_8tb_3/sj/work/reid/fast-reid ; env PYTHONIOENCODING=UTF-8 PYTHONUNBUFFERED=1 /root/anaconda3/envs/fastreid/bin/python /root/.vscode-server/extensions/ms-python.python-2020.2.64397/pythonFiles/ptvsd_launcher.py --default --nodebug --client --host localhost --port 41755 /home/wesine/data_8tb_3/sj/work/reid/fast-reid/tools/train_net.py --config-file ./configs/Market1501/sbs_R50-ibn.yml --eval-only MODEL.WEIGHTS logs/market1501/sbs_R50-ibn/model_best.pth MODEL.DEVICE cuda:0 Command Line Args: Namespace(config_file='./configs/Market1501/sbs_R50-ibn.yml', dist_url='tcp://127.0.0.1:49152', eval_only=True, machine_rank=0, num_gpus=1, num_machines=1, opts=['MODEL.WEIGHTS', 'logs/market1501/sbs_R50-ibn/model_best.pth', 'MODEL.DEVICE', 'cuda:0'], resume=False) [03/22 11:49:03 fastreid]: Rank of current process: 0. World size: 1 [03/22 11:49:04 fastreid]: Environment info:


sys.platform linux Python 3.6.13 packaged by conda-forge (default, Feb 19 2021, 05:36:01) [GCC 9.3.0] numpy 1.19.5 fastreid 1.0.0 @/home/wesine/data_8tb_3/sj/work/reid/fast-reid/fastreid FASTREID_ENV_MODULE PyTorch 1.2.0 @/root/anaconda3/envs/fastreid/lib/python3.6/site-packages/torch PyTorch debug build False GPU available True GPU 0,1 GeForce RTX 2080 Ti CUDA_HOME /usr/local/cuda Pillow 8.1.2 torchvision 0.4.0 @/root/anaconda3/envs/fastreid/lib/python3.6/site-packages/torchvision torchvision arch flags sm_35, sm_50, sm_60, sm_70, sm_75 cv2 4.5.1

PyTorch built with:

[03/22 11:49:04 fastreid]: Command line arguments: Namespace(config_file='./configs/Market1501/sbs_R50-ibn.yml', dist_url='tcp://127.0.0.1:49152', eval_only=True, machine_rank=0, num_gpus=1, num_machines=1, opts=['MODEL.WEIGHTS', 'logs/market1501/sbs_R50-ibn/model_best.pth', 'MODEL.DEVICE', 'cuda:0'], resume=False) [03/22 11:49:04 fastreid]: Contents of args.config_file=./configs/Market1501/sbs_R50-ibn.yml: BASE: ../Base-SBS.yml

MODEL: BACKBONE: WITH_IBN: True

DATASETS: NAMES: ("Market1501",) TESTS: ("Market1501",)

OUTPUT_DIR: logs/market1501/sbs_R50-ibn

[03/22 11:49:04 fastreid]: Running with full config: CUDNN_BENCHMARK: True DATALOADER: NAIVE_WAY: True NUM_INSTANCE: 16 NUM_WORKERS: 8 PK_SAMPLER: True DATASETS: COMBINEALL: False NAMES: ('Market1501',) TESTS: ('Market1501',) INPUT: AUGMIX_PROB: 0.0 AUTOAUG_PROB: 0.1 CJ: BRIGHTNESS: 0.15 CONTRAST: 0.15 ENABLED: False HUE: 0.1 PROB: 0.5 SATURATION: 0.1 DO_AFFINE: False DO_AUGMIX: False DO_AUTOAUG: True DO_FLIP: True DO_PAD: True FLIP_PROB: 0.5 PADDING: 10 PADDING_MODE: constant REA: ENABLED: True PROB: 0.5 VALUE: [123.675, 116.28, 103.53] RPT: ENABLED: False PROB: 0.5 SIZE_TEST: [384, 128] SIZE_TRAIN: [384, 128] KD: MODEL_CONFIG: [''] MODEL_WEIGHTS: [''] MODEL: BACKBONE: DEPTH: 50x FEAT_DIM: 2048 LAST_STRIDE: 1 NAME: build_resnet_backbone NORM: BN PRETRAIN: True PRETRAIN_PATH: WITH_IBN: True WITH_NL: True WITH_SE: False DEVICE: cuda:0 FREEZE_LAYERS: ['backbone'] HEADS: CLS_LAYER: circleSoftmax EMBEDDING_DIM: 0 MARGIN: 0.35 NAME: EmbeddingHead NECK_FEAT: after NORM: BN NUM_CLASSES: 0 POOL_LAYER: gempoolP SCALE: 64 WITH_BNNECK: True LOSSES: CE: ALPHA: 0.2 EPSILON: 0.1 SCALE: 1.0 CIRCLE: GAMMA: 128 MARGIN: 0.25 SCALE: 1.0 COSFACE: GAMMA: 128 MARGIN: 0.25 SCALE: 1.0 FL: ALPHA: 0.25 GAMMA: 2 SCALE: 1.0 NAME: ('CrossEntropyLoss', 'TripletLoss') TRI: HARD_MINING: True MARGIN: 0.0 NORM_FEAT: False SCALE: 1.0 META_ARCHITECTURE: Baseline PIXEL_MEAN: [123.675, 116.28, 103.53] PIXEL_STD: [58.395, 57.120000000000005, 57.375] QUEUE_SIZE: 8192 WEIGHTS: logs/market1501/sbs_R50-ibn/model_best.pth OUTPUT_DIR: logs/market1501/sbs_R50-ibn SOLVER: BASE_LR: 0.00035 BIAS_LR_FACTOR: 1.0 CHECKPOINT_PERIOD: 20 DELAY_EPOCHS: 30 ETA_MIN_LR: 7e-07 FP16_ENABLED: False FREEZE_FC_ITERS: 0 FREEZE_ITERS: 1000 GAMMA: 0.1 HEADS_LR_FACTOR: 1.0 IMS_PER_BATCH: 64 MAX_EPOCH: 60 MOMENTUM: 0.9 NESTEROV: True OPT: Adam SCHED: CosineAnnealingLR STEPS: [40, 90] WARMUP_FACTOR: 0.1 WARMUP_ITERS: 2000 WARMUP_METHOD: linear WEIGHT_DECAY: 0.0005 WEIGHT_DECAY_BIAS: 0.0005 TEST: AQE: ALPHA: 3.0 ENABLED: False QE_K: 5 QE_TIME: 1 EVAL_PERIOD: 10 FLIP_ENABLED: False IMS_PER_BATCH: 128 METRIC: cosine PRECISE_BN: DATASET: Market1501 ENABLED: False NUM_ITER: 300 RERANK: ENABLED: False K1: 20 K2: 6 LAMBDA: 0.3 ROC_ENABLED: False [03/22 11:49:04 fastreid]: Full config saved to /home/wesine/data_8tb_3/sj/work/reid/fast-reid/logs/market1501/sbs_R50-ibn/config.yaml [03/22 11:49:04 fastreid.utils.env]: Using a generated random seed 4471883

Baseline( (backbone): ResNet( (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False) (bn1): BatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=True) (layer1): Sequential( (0): Bottleneck( (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): IBN( (IN): InstanceNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (BN): BatchNorm(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (se): Identity() (downsample): Sequential( (0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (1): BatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (1): Bottleneck( (conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): IBN( (IN): InstanceNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (BN): BatchNorm(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (se): Identity() ) (2): Bottleneck( (conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): IBN( (IN): InstanceNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (BN): BatchNorm(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (se): Identity() ) ) (layer2): Sequential( (0): Bottleneck( (conv1): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): IBN( (IN): InstanceNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (BN): BatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False) (bn2): BatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (se): Identity() (downsample): Sequential( (0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False) (1): BatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (1): Bottleneck( (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): IBN( (IN): InstanceNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (BN): BatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (se): Identity() ) (2): Bottleneck( (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): IBN( (IN): InstanceNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (BN): BatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (se): Identity() ) (3): Bottleneck( (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): IBN( (IN): InstanceNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (BN): BatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (se): Identity() ) ) (layer3): Sequential( (0): Bottleneck( (conv1): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): IBN( (IN): InstanceNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (BN): BatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False) (bn2): BatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (se): Identity() (downsample): Sequential( (0): Conv2d(512, 1024, kernel_size=(1, 1), stride=(2, 2), bias=False) (1): BatchNorm(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (1): Bottleneck( (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): IBN( (IN): InstanceNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (BN): BatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (se): Identity() ) (2): Bottleneck( (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): IBN( (IN): InstanceNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (BN): BatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (se): Identity() ) (3): Bottleneck( (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): IBN( (IN): InstanceNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (BN): BatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (se): Identity() ) (4): Bottleneck( (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): IBN( (IN): InstanceNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (BN): BatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (se): Identity() ) (5): Bottleneck( (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): IBN( (IN): InstanceNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (BN): BatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (se): Identity() ) ) (layer4): Sequential( (0): Bottleneck( (conv1): Conv2d(1024, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (se): Identity() (downsample): Sequential( (0): Conv2d(1024, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False) (1): BatchNorm(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (1): Bottleneck( (conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (se): Identity() ) (2): Bottleneck( (conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (se): Identity() ) ) (NL_1): ModuleList() (NL_2): ModuleList( (0): Non_local( (g): Conv2d(512, 1, kernel_size=(1, 1), stride=(1, 1)) (W): Sequential( (0): Conv2d(1, 512, kernel_size=(1, 1), stride=(1, 1)) (1): BatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (theta): Conv2d(512, 1, kernel_size=(1, 1), stride=(1, 1)) (phi): Conv2d(512, 1, kernel_size=(1, 1), stride=(1, 1)) ) (1): Non_local( (g): Conv2d(512, 1, kernel_size=(1, 1), stride=(1, 1)) (W): Sequential( (0): Conv2d(1, 512, kernel_size=(1, 1), stride=(1, 1)) (1): BatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (theta): Conv2d(512, 1, kernel_size=(1, 1), stride=(1, 1)) (phi): Conv2d(512, 1, kernel_size=(1, 1), stride=(1, 1)) ) ) (NL_3): ModuleList( (0): Non_local( (g): Conv2d(1024, 1, kernel_size=(1, 1), stride=(1, 1)) (W): Sequential( (0): Conv2d(1, 1024, kernel_size=(1, 1), stride=(1, 1)) (1): BatchNorm(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (theta): Conv2d(1024, 1, kernel_size=(1, 1), stride=(1, 1)) (phi): Conv2d(1024, 1, kernel_size=(1, 1), stride=(1, 1)) ) (1): Non_local( (g): Conv2d(1024, 1, kernel_size=(1, 1), stride=(1, 1)) (W): Sequential( (0): Conv2d(1, 1024, kernel_size=(1, 1), stride=(1, 1)) (1): BatchNorm(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (theta): Conv2d(1024, 1, kernel_size=(1, 1), stride=(1, 1)) (phi): Conv2d(1024, 1, kernel_size=(1, 1), stride=(1, 1)) ) (2): Non_local( (g): Conv2d(1024, 1, kernel_size=(1, 1), stride=(1, 1)) (W): Sequential( (0): Conv2d(1, 1024, kernel_size=(1, 1), stride=(1, 1)) (1): BatchNorm(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (theta): Conv2d(1024, 1, kernel_size=(1, 1), stride=(1, 1)) (phi): Conv2d(1024, 1, kernel_size=(1, 1), stride=(1, 1)) ) ) (NL_4): ModuleList() ) (heads): EmbeddingHead( (pool_layer): GeneralizedMeanPoolingP(Parameter containing: tensor([3.], device='cuda:0', requires_grad=True), output_size=1) (bottleneck): Sequential( (0): BatchNorm(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (classifier): CircleSoftmax(in_features=2048, num_classes=0, scale=64, margin=0.35) ) ) [03/22 11:49:08 fastreid.utils.checkpoint]: Loading checkpoint from logs/market1501/sbs_R50-ibn/model_best.pth WARNING [03/22 11:49:09 fastreid.utils.checkpoint]: Skip loading parameter 'heads.classifier.weight' to the model due to incompatible shapes: (751, 2048) in the checkpoint but (0, 2048) in the model! You might want to double check if this is expected. [03/22 11:49:09 fastreid.utils.checkpoint]: Some model parameters or buffers are not found in the checkpoint: heads.classifier.weight [03/22 11:49:09 fastreid.engine.defaults]: Prepare testing set [03/22 11:49:09 fastreid.data.datasets.bases]: => Loaded Market1501 in csv format: subset # ids # images # cameras :--------- :-------- :----------- :------------ query 750 3368 6 gallery 751 15913 6 [03/22 11:49:09 fastreid.evaluation.evaluator]: Start inference on 19281 images [03/22 11:49:12 fastreid.evaluation.evaluator]: Inference done 11/151. 0.2056 s / batch. ETA=0:00:28 [03/22 11:49:41 fastreid.evaluation.evaluator]: Total inference time: 0:00:30.367810 (0.207999 s / batch per device, on 1 devices) [03/22 11:49:41 fastreid.evaluation.evaluator]: Total inference pure compute time: 0:00:30 (0.205810 s / batch per device, on 1 devices) [03/22 11:49:47 fastreid.engine.defaults]: Evaluation results for Market1501 in csv format: [03/22 11:49:47 fastreid.evaluation.testing]: Evaluation results in csv format: Dataset Rank-1 Rank-5 Rank-10 mAP mINP metric
Market1501 93.94 97.57 98.28 81.89 48.57 87.91
sijun-zhou commented 3 years ago

@L1aoXingyu hi, L1aoXingyu. If you have time, could u plz have a look for this problems? Thanks in advance!

gmt710 commented 3 years ago

hello, have you loaded pretrained model successfully ? you can join wechat group. https://github.com/JDAI-CV/fast-reid/issues/354

L1aoXingyu commented 3 years ago

@sijun-zhou You can firstly try to use 1 GPU to reproduce the results in the model zoo. If you use 2 GPUs, you need to tune batch size twice.

sijun-zhou commented 3 years ago

hello, have you loaded pretrained model successfully ? you can join wechat group.

354

hi gmt710 , you can have a look for my train log pasted above. It shows that the training using pretrain model. "[03/22 09:47:04 fastreid.modeling.backbones.resnet]: Loading pretrained model from /root/.cache/torch/checkpoints/resnet50_ibn_a-d9d0bb7b.pth".

And i pasted the snippet of the above log here. You can have a check, including missing keys and keys that not used:

###################################################### [03/22 09:47:04 fastreid.modeling.backbones.resnet]: Loading pretrained model from /root/.cache/torch/checkpoints/resnet50_ibn_a-d9d0bb7b.pth [03/22 09:47:04 fastreid.modeling.backbones.resnet]: Some model parameters or buffers are not found in the checkpoint: NL_2.0.g.{weight, bias} NL_2.0.W.0.{weight, bias} NL_2.0.W.1.{weight, bias, running_mean, running_var} NL_2.0.theta.{weight, bias} NL_2.0.phi.{weight, bias} NL_2.1.g.{weight, bias} NL_2.1.W.0.{weight, bias} NL_2.1.W.1.{weight, bias, running_mean, running_var} NL_2.1.theta.{weight, bias} NL_2.1.phi.{weight, bias} NL_3.0.g.{weight, bias} NL_3.0.W.0.{weight, bias} NL_3.0.W.1.{weight, bias, running_mean, running_var} NL_3.0.theta.{weight, bias} NL_3.0.phi.{weight, bias} NL_3.1.g.{weight, bias} NL_3.1.W.0.{weight, bias} NL_3.1.W.1.{weight, bias, running_mean, running_var} NL_3.1.theta.{weight, bias} NL_3.1.phi.{weight, bias} NL_3.2.g.{weight, bias} NL_3.2.W.0.{weight, bias} NL_3.2.W.1.{weight, bias, running_mean, running_var} NL_3.2.theta.{weight, bias} NL_3.2.phi.{weight, bias} [03/22 09:47:04 fastreid.modeling.backbones.resnet]: The checkpoint state_dict contains keys that are not used by the model: fc.{weight, bias} ######################################################

sijun-zhou commented 3 years ago

@sijun-zhou You can firstly try to use 1 GPU to reproduce the results in the model zoo. If you use 2 GPUs, you need to tune batch size twice.

@L1aoXingyu Hi, L1aoXingyu, I have tested with 1 GPU, which got nearly the same result as you posted in the model zone. Thank you very much!

BTW. I don't quite understand what does "you need to tune batch size twice" mean, if I want to use 2 GPUs. Could you plz give me a more specific guidelines or description? Thanks a lot!

L1aoXingyu commented 3 years ago

It means if you want to train a model with 2 GPUs, you need to tune the batch size from 64 to 128.

sky186 commented 3 years ago

@L1aoXingyu 最新代码训练多卡训练测试问题 1、2卡训练,batch to 256,训练没有问题,但是测试的时候,返回的结果是空的, 单卡测试正常, 2、超参数问题 Freeze 和 warmup 是 迭代数?, 根据自己的数据量和batch 计算出 iter, 是不是通常计算到10个epoch的迭代数, 因为超参数的其他好像是 epoch 数量, 就这两个参数好像 是迭代数, 有歧义,可以说明一下?

L1aoXingyu commented 3 years ago

@sky186

  1. 这个我明天测试一下;
  2. freeze 和 warmup 按 iter 设置更加合理,在一些比较大数据集的训练 setting 里面,比如 face recognition,总的 epoch 就跑 16次,所以不可能按照 epoch 去设置 warmup,可能设置的 warmup 次数小于 1 个 epoch,所以更加合理的方式是直接设置一个 training iter,配置文件里面也很清楚是 WARMUP_ITERMAX_EPOCH
sky186 commented 3 years ago

@L1aoXingyu 您好,请问最新的代码 数据处理 到提取特征部分和之前有哪里不同吗? 因为之前的版本抽取了一个提取特征的代码接口, 正确,测试结果正确 这里我换成最新版本训练的模型和config 提取特征后,测试结果完全不正确,其他设置都是一致的,代码有点多不知道怎么找那些可能修改。

L1aoXingyu commented 3 years ago

@sky186 是不是 model 没有 load 进去呢? 另外我测试了一下,多卡测试是可以跑的,多卡测试时,只会在主进程返回结果

sky186 commented 3 years ago

@L1aoXingyu 1、嗯是的,经检查,模型参数的加载这边没有真的加载成功,做了修改,现在好了,超级感谢~ 2、谢谢您的回复, 多卡的时候测试结果返回空, 在 defaults.py/ def test(cls,cfg,model,evaluators=None ) 这里有个测试的results , 多卡的时候这里返回是空的。 您说的主进程返回结果,大概是在哪里尼

L1aoXingyu commented 3 years ago

@sky186 你从哪里拿的测试返回结果? https://github.com/JDAI-CV/fast-reid/blob/25cfa88fd97fbef55abcdd1bf69f2db822306bff/fastreid/evaluation/reid_evaluation.py#L55 这里的代码表示非主进程,返回空的 {}