结果无法复现 - Githubissues

woocoder commented 2 years ago

请问论文中的结果是按照这个配置训练出来的吗？ https://github.com/DoubtedSteam/MPANet/blob/main/configs/SYSU.yml 我在自己机器上测试基本要比论文低5个点左右。

DoubtedSteam commented 2 years ago

是的。这个方法的性能确实会存在波动，但是并不会有5个点这么多。请提供更多的细节。

woocoder commented 2 years ago

感谢回复！这里是训练参数和结果，目前正在测试不同的seed结果，但基本都在这个水平左右。

2022-06-29 23:05:58,575 {'batch_size': 128, 'center': False, 'center_cluster': True, 'classification': True, 'color_jitter': False, 'data_root': 'configs/dataset/SYSU-MM01', 'dataset': 'sysu', 'drop_last_stride': True, 'eval_interval': 5, 'fp16': True, 'image_size': (384, 128), 'k_size': 8, 'log_period': 150, 'lr': 0.00035, 'lr_step': [80, 120], 'margin': 0.7, 'modality_attention': 2, 'mutual_learning': True, 'num_cam': 6, 'num_epoch': 140, 'num_id': 395, 'num_parts': 6, 'optimizer': 'adam', 'p_size': 16, 'padding': 10, 'pattern_attention': True, 'prefix': 'SYSU', 'random_crop': True, 'random_erase': True, 'random_flip': True, 'rerank': False, 'resume': '', 'sample_method': 'identity_random', 'start_eval': 115, 'triplet': False, 'update_rate': 0.2, 'wd': 0.0005, 'weight_KL': 2.5, 'weight_sep': 0.5, 'weight_sid': 0.5}

2022-06-30 00:32:46,902 all num-shot:1 r1 precision = 65.64 , r10 precision = 94.90 , r20 precision = 97.94, mAP = 62.90 2022-06-30 00:34:22,785 all num-shot:10 r1 precision = 73.18 , r10 precision = 97.05 , r20 precision = 99.01, mAP = 57.49 2022-06-30 00:34:28,690 indoor num-shot:1 r1 precision = 71.91 , r10 precision = 97.63 , r20 precision = 99.49, mAP = 76.69 2022-06-30 00:35:03,715 indoor num-shot:10 r1 precision = 79.98 , r10 precision = 99.04 , r20 precision = 99.86, mAP = 70.28 2022-06-30 00:35:03,780 Engine run complete. Time taken: 01:29:00

woocoder commented 2 years ago

是的。这个方法的性能确实会存在波动，但是并不会有5个点这么多。请提供更多的细节。

尝试了一些种子，在all search single shot 任务下，mAP波动基本在62-64之间，与论文提到的68还是有些差距

请问是否可以再分享一下checkpoint呢？谢谢

DoubtedSteam commented 2 years ago

我在一台新机器上，重新下载项目并复现，实验结果为： all num-shot:1 r1 precision = 68.78 , r10 precision = 95.00 , r20 precision = 98.28, mAP = 65.96 all num-shot:10 r1 precision = 74.68 , r10 precision = 97.00 , r20 precision = 99.24, mAP = 61.05 indoor num-shot:1 r1 precision = 74.87 , r10 precision = 98.06 , r20 precision = 99.48, mAP = 79.29 indoor num-shot:10 r1 precision = 82.97 , r10 precision = 99.28 , r20 precision = 99.87, mAP = 74.03 显卡是2080ti 环境： Name Version Build Channel _libgcc_mutex 0.1 main defaults addict 2.2.1 pypi_0 pypi apex 0.1 pypi_0 pypi bezier 2021.2.12 pypi_0 pypi blas 1.0 mkl defaults brotlipy 0.7.0 py37hb5d75c8_1001 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge bzip2 1.0.8 h7b6447c_0 defaults ca-certificates 2020.12.5 ha878542_0 conda-forge cairo 1.14.12 h8948797_3 defaults certifi 2020.12.5 py37h89c1867_1 conda-forge cffi 1.14.0 py37he30daa8_1 defaults chardet 3.0.4 pypi_0 pypi click 7.1.2 pypi_0 pypi cryptography 3.2.1 py37hc72a4ac_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge cudatoolkit 10.0.130 0 defaults cycler 0.10.0 py37_0 defaults dbus 1.13.16 hb2f20db_0 defaults dominate 2.4.0 py_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge expat 2.2.9 he6710b0_2 defaults ffmpeg 4.0 hcdf2ecd_0 defaults filelock 3.0.12 pypi_0 pypi fontconfig 2.13.0 h9420a91_0 defaults freeglut 3.0.0 hf484d3e_5 defaults freetype 2.10.2 h5ab3b9f_0 defaults fvcore 0.1.2.post20210115 pyhd8ed1ab_0 conda-forge glib 2.65.0 h3eb4bd4_0 defaults graphite2 1.3.14 h23475e2_0 defaults gst-plugins-base 1.14.0 hbbd80ab_1 defaults gstreamer 1.14.0 hb31296c_0 defaults h5py 2.8.0 py37h989c5e5_3 defaults harfbuzz 1.8.8 hffaf4a1_0 defaults hdf5 1.10.2 hba1933b_1 defaults icu 58.2 he6710b0_3 defaults idna 2.10 pyh9f0ad1d_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge ignite 0.2.1 py37_0 pytorch intel-openmp 2020.1 217 defaults jasper 2.0.14 h07fcdf6_1 defaults joblib 0.16.0 py_0 defaults jpeg 9b h024ee3a_2 defaults jsonpatch 1.24 py_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge jsonpointer 2.0 py_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge kiwisolver 1.2.0 py37hfd86e86_0 defaults lcms2 2.11 h396b838_0 defaults ld_impl_linux-64 2.33.1 h53a641e_7 defaults libedit 3.1.20191231 h14c3975_1 defaults libffi 3.3 he6710b0_2 defaults libgcc-ng 9.1.0 hdf63c60_0 defaults libgfortran-ng 7.3.0 hdf63c60_0 defaults libglu 9.0.0 hf484d3e_1 defaults libopencv 3.4.2 hb342d67_1 defaults libopus 1.3.1 h7b6447c_0 defaults libpng 1.6.37 hbc83047_0 defaults libsodium 1.0.10 0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free libstdcxx-ng 9.1.0 hdf63c60_0 defaults libtiff 4.1.0 h2733197_1 defaults libuuid 1.0.3 h1bed415_2 defaults libvpx 1.7.0 h439df22_0 defaults libxcb 1.14 h7b6447c_0 defaults libxml2 2.9.10 he19cac6_1 defaults lz4-c 1.9.2 he6710b0_0 defaults matplotlib 3.2.2 0 defaults matplotlib-base 3.2.2 py37hef1b27d_0 defaults mkl 2020.1 217 defaults mkl-service 2.3.0 py37he904b0f_0 defaults mkl_fft 1.1.0 py37h23d657b_0 defaults mkl_random 1.1.1 py37h0573a6f_0 defaults mmcv 1.1.1 pypi_0 pypi munch 2.5.1.dev12 pypi_0 pypi ncurses 6.2 he6710b0_1 defaults ninja 1.9.0 py37hfd86e86_0 defaults numpy 1.20.2 pypi_0 pypi olefile 0.46 py37_0 defaults opencv 3.4.2 py37h6fd60c2_1 defaults opencv-python 4.4.0.42 pypi_0 pypi openssl 1.1.1h h516909a_0 conda-forge packaging 20.4 pypi_0 pypi pandas 1.1.3 py37he6710b0_0 anaconda pcre 8.44 he6710b0_0 defaults pillow 7.2.0 py37hb39fc2d_0 defaults pip 20.1.1 py37_1 defaults pixman 0.40.0 h7b6447c_0 defaults portalocker 1.7.0 py37h89c1867_1 conda-forge protobuf 3.14.0 pypi_0 pypi py-opencv 3.4.2 py37hb342d67_1 defaults pycparser 2.20 py_2 defaults pyopenssl 20.0.1 pyhd8ed1ab_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge pyparsing 2.4.7 py_0 defaults pyqt 5.9.2 py37h05f1152_2 defaults pysocks 1.7.1 py37h89c1867_3 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge python 3.7.7 hcff3b4d_5 defaults python-dateutil 2.8.1 py_0 defaults python_abi 3.7 1_cp37m conda-forge pytorch 1.2.0 py3.7_cuda10.0.130_cudnn7.6.2_0 pytorch pytz 2020.1 py_0 anaconda pyyaml 5.3.1 py37h7b6447c_1 defaults pyzmq 17.1.2 py37hae99301_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge qt 5.9.7 h5867ecd_1 defaults readline 8.0 h7b6447c_0 defaults regex 2020.11.13 pypi_0 pypi requests 2.25.0 pypi_0 pypi sacremoses 0.0.43 pypi_0 pypi scikit-learn 0.23.1 py37h423224d_0 defaults scipy 1.5.0 py37h0b6359f_0 defaults seaborn 0.11.0 py_0 anaconda sentencepiece 0.1.91 pypi_0 pypi setuptools 49.2.0 py37_0 defaults sip 4.19.8 py37hf484d3e_0 defaults six 1.15.0 py_0 defaults sqlite 3.32.3 h62c20be_0 defaults tabulate 0.8.7 pyh9f0ad1d_0 conda-forge tensorboardx 2.1 pypi_0 pypi termcolor 1.1.0 py_2 conda-forge thop 0.0.31-2005241907 pypi_0 pypi threadpoolctl 2.1.0 pyh5ca1d4c_0 defaults tk 8.6.10 hbc83047_0 defaults tokenizers 0.9.3 pypi_0 pypi torchfile 0.1.0 py_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge torchvision 0.4.0 py37_cu100 pytorch tornado 6.0.4 py37h7b6447c_1 defaults tqdm 4.53.0 pypi_0 pypi transformers 3.5.1 pypi_0 pypi urllib3 1.26.2 pypi_0 pypi visdom 0.1.8.9 0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge websocket-client 0.57.0 py37h89c1867_4 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge wheel 0.34.2 py37_0 defaults word2vec 0.11.1 pypi_0 pypi xz 5.2.5 h7b6447c_0 defaults yacs 0.1.7 pypi_0 pypi yaml 0.2.5 h7b6447c_0 defaults yapf 0.30.0 pypi_0 pypi zeromq 4.2.5 hfc679d8_4 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge zlib 1.2.11 h7b6447c_3 defaults zstd 1.4.5 h0b5b093_0 defaults

checkpoint：链接：https://pan.baidu.com/s/1UYxVR5S1aBEbdctw7g5Wmw 提取码：zhcq

woocoder commented 2 years ago

找到问题了因为使用了0.4.9版的ignite库新版ignite库里新增了一个变量self.state.epoch_length 会在engine.run第一次运行时，根据dataloader的length变量计算迭代次数 https://github.com/pytorch/ignite/blob/master/ignite/engine/engine.py#L707 而根据默认的yaml文件，会使用RandomIdentitySampler采样器，这个采样器会在第一次调用iter方法时被方法内部逻辑改变样本数量（从1w+变成2w+）https://github.com/DoubtedSteam/MPANet/blob/main/data/sampler.py#L158 但ignite的迭代次数是根据长度改变之前的1w+计算的，所以会导致训练过程只读取到一半的样本数量这个问题，只要把采样器的长度直接在init方法中定义好，不在iter方法中二次改变即可修复修正后的结果为：

all num-shot:1 r1 precision = 69.67 , r10 precision = 95.15 , r20 precision = 98.09, mAP = 66.87 [2022-07-02 06:58:21] INFO (nni/MainThread) Final result: 66.86681531981273 Final result: 66.86681531981273 all num-shot:10 r1 precision = 74.88 , r10 precision = 96.99 , r20 precision = 99.07, mAP = 61.82 [2022-07-02 06:59:41] INFO (root/MainThread) all num-shot:10 r1 precision = 74.88 , r10 precision = 96.99 , r20 precision = 99.07, mAP = 61.82 indoor num-shot:1 r1 precision = 75.60 , r10 precision = 97.45 , r20 precision = 99.43, mAP = 79.57 [2022-07-02 06:59:46] INFO (root/MainThread) indoor num-shot:1 r1 precision = 75.60 , r10 precision = 97.45 , r20 precision = 99.43, mAP = 79.57 indoor num-shot:10 r1 precision = 82.91 , r10 precision = 99.39 , r20 precision = 99.92, mAP = 74.10

基本与论文结果一致，感谢作者的及时解答，谢谢:)

DoubtedSteam / MPANet

结果无法复现 #9