JDAI-CV / fast-reid

SOTA Re-identification Methods and Toolbox
Apache License 2.0
3.43k stars 838 forks source link

unable to reproduce the result of SBS(R50) #54

Closed sijun-zhou closed 4 years ago

sijun-zhou commented 4 years ago

Hi, @L1aoXingyu , First, very appreciate for your excellent reid sota work.^_^. I am very interested in this project. I have tried the two SBS(R50) models on dataset market1501. But I do not get the result as mentioned in models zone, which has the following accuracy. I don't know why. Could you please help me? Thanks in advance!


Method | Pretrained | Rank@1 | mAP | mINP SBS(R50) | ImageNet | 95.4% | 88.2% | 64.8%


I will post my final config.yaml in the following.

My environment: Python 3.6, 2 titan XP GPUs, cuda 9.0, pytorch 1.5, torchvision 0.6.0

sijun-zhou commented 4 years ago
  1. all are the default settings from the repo without any changes(I pull the repo 3 days ago).

yaml.config:

CUDNN_BENCHMARK: true DATALOADER: NUM_INSTANCE: 16 NUM_WORKERS: 0 PK_SAMPLER: true DATASETS: NAMES:

Best Result: image

L1aoXingyu commented 4 years ago

@sijun-zhou are u training SBS with 2 gpus?

sijun-zhou commented 4 years ago
  1. I change the "HEADS:CLS_LAYER: circle" to "HEADS:CLS_LAYER: linear". All other settings are the same.

config.yaml:

CUDNN_BENCHMARK: true DATALOADER: NUM_INSTANCE: 16 NUM_WORKERS: 0 PK_SAMPLER: true DATASETS: NAMES:

Best Result:

image

sijun-zhou commented 4 years ago

@sijun-zhou are u training SBS with 2 gpus?

@L1aoXingyu yes. with 2 GPUs.

L1aoXingyu commented 4 years ago

@sijun-zhou you need to train with 1 gpu, then you can reproduce the result or you need to use syncBN. But there is something wrong about syncBN, I will fix it tomorrow.

sijun-zhou commented 4 years ago

@sijun-zhou you need to train with 1 gpu, then you can reproduce the result.

@L1aoXingyu I'll try. thx!

L1aoXingyu commented 4 years ago

@sijun-zhou feel free to ask me anything here

L1aoXingyu commented 4 years ago

@sijun-zhou btw, you need to change HEADS.CLS_LAYER to circle, then you can reproduce the result. Pls keep all things in config the same.

sijun-zhou commented 4 years ago

@sijun-zhou btw, you need to change HEADS.CLS_LAYER to circle, then you can reproduce the result.

@sijun-zhou btw, you need to change HEADS.CLS_LAYER to circle, then you can reproduce the result. Pls keep all things in config the same.

got it. training in progress now. Any result got later will post here.

sijun-zhou commented 4 years ago

@L1aoXingyu I have reproduce for the result of SBS(R50) and SBS_R50-ibn on market1501, and got the nearly the same high accuracy as you mentioned in model zone. Thank you very much for the advice!

BTW. why one gpu can get a higher accuracy? and the accuracy will decrease with two gpus?

L1aoXingyu commented 4 years ago

This is because of BN. When you train with 2 GPUs, the normalization batch size is 32. Because 32 images(2 IDs) in each GPU to compute BN batch mean and batch var. This is biased. If you want to train with 2 GPUs with 64 batch size, you need to change the config file NORM: BN to NORM: syncBN, then the batch mean and var will be computed cross 2 GPUs. In this way, the normalization batch size is still 64. Or you can change the batch size to 128 which will ensure the normalization batch size is 64.

zhanghongruiupup commented 4 years ago

This is because of BN. When you train with 2 GPUs, the normalization batch size is 32. Because 32 images(2 IDs) in each GPU to compute BN batch mean and batch var. This is biased. If you want to train with 2 GPUs with 64 batch size, you need to change the config file NORM: BN to NORM: syncBN, then the batch mean and var will be computed cross 2 GPUs. In this way, the normalization batch size is still 64. Or you can change the batch size to 128 which will ensure the normalization batch size is 64.

你好,你说过circle loss最好是4个id 16个图像。我现在有4个gpu,准备设置 NORM: BN以及 256 batch,刚好一张GPU分配 64batch ,这样不会受到circle loss 影响吧?谢谢