Some questions about the model structure and the reappearance of the results.

Liu-1994 commented 4 years ago

Hello, thank you very much for providing the implementation code of the DG-Net model. I encountered some problems during the implementation of the project. I will be honored if you can give me some suggestions.

The difference of the model structure between the provided trained model and the code. I have successfully evaluated the DG-Net trained model that you provided in the github. However, when I resumed the training from the trained model that you provided, the project reported an error as follow: I found the extra weights were the non_local layer of the ResBlock. However, after I changed the res_type of the ContentEncoder and Decoder, the weights was still wrong.
The reapperance cannot achieve the expected map. I evaluated the DG-Net trained model that you provided and got the mAP for 0.8609 when alpha was 0.5. However, when I retrained, I only got the mAP for 0.8466. I had loaded the teacher model and the config was configs/latest.yaml. Is there anything else I did not notice? By the way, I used the reid_eval/test_2label.py for evaluate.

I will be grateful if you can give me some suggestions. Thank you！

layumi commented 4 years ago

Hi @Liu-1994

It is due to that I added the non-local layer to the model before the paper submission. But we did not use it. When we decided to release our code, we simplified the code and removed the non-local layers which takes extra GPU memory. You may try to load the model by 'strict=False'.
Did you use fp16 or anything else? fp16 will lead to performance drop 1 percent.

Liu-1994 commented 4 years ago

@layumi Thank you for your reply. I hava understood the structure of the provided model. About the mAP, I think I do not use fp16 as the config is apex: false. and I have not installed the NVIDIA/apex. I have a small problem. As the provided trained model have weights about non_local layers, would the structure of the model affect the mAP? By the way, what version of pytorch did you use? According to Pytorch's official documentation, PyTorch 1.1.0 and later versions adjust the order of lr_scheduler.step() and optimizer.step(), which may affect the reappearance of the results.

layumi commented 4 years ago

@Liu-1994 No. It will not affect the performance, since I did not include the non-local layer in the forward function.

I might utilize the pytorch 1.0.0 before the paper submission. By the way, how many GPU do you use? It may affect the performance.

Liu-1994 commented 4 years ago

@layumi Thanks for your reply. I should only use a GPU, as the gpu-ids is only a number 0.

NVlabs / DG-Net

Some questions about the model structure and the reappearance of the results. #39