Closed WentaoTan closed 3 years ago
Hi, please can I ask the detail of your reproduce environment?
I tried with a single A100 (40G) with PyTorch of 1.8.0 and cuda of 11.1.1. But the performance is not as good as reported.
e.g.
# MSMT17_V1+Market1501+CUHK_NP --> DukeMTMC-reID, checkpoint.pth.tar
Mean AP: 43.6% (48.8 reported)
CMC Scores:
top-1 62.2% (67.2 reported)
top-5 76.4%
top-10 81.4%
Thanks
Hi @Vincent-TwT ,
Thanks for your interest and sorry about the late reply.
For your first question, utilizing meta-learning indeed greatly increases the computational cost. This is because we need to retain the graph for the meta-train model and calculate the high order gradients of meta-test loss. The process of updating parameters has a slight influence on the training time.
For your second question, I have never seen an error like that. It is better to provide the full traceback.
For your third question, I think you use the CUHK-NP as the source domain? We provide the results of using MSMT17_V1 and CUHK-NP as the source domains in our latest version on arxiv, e.g., MS + C + D --> M reaches 52.5 mAP. For our ablation studies, we use the CUHK03 (old protocol) and MSMT17_V2 as the source domains. We aggregate all the source domains together without domain labels to train a model as the baseline. Our meta-learning strategy includes the balanced sampling of source domains and the high-order gradients. With ResNet-50 backbone, the baseline achieves 41.1 mAP, baseline+our meta-sampling achieves 46.4 mAP, baseline+meta-learning strategy achieves 47.4 mAP and our full model achieves 48.1 mAP. We found that the high-order gradients have less influence with IBN-Net because instance normalization reduces the domain gaps.
Hi @liqilei ,
Thank you for your interest.
As in the readme, we use three 2080Ti GPUs for training the model. ReID model is sensitive to the number of GPUs, probably due to BN. I do not have the device to run the code within one GPU, please reproduce our results with three GPUs.
Hi @liqilei ,
Thank you for your interest.
As in the readme, we use three 2080Ti GPUs for training the model. ReID model is sensitive to the number of GPUs, probably due to BN. I do not have the device to run the code within one GPU, please reproduce our results with three GPUs.
Thanks for your reply. Based on your guidance, I tried on three V100 16 GB GPUs and the reproduced results seems reasonable (see below).
Mean AP: 52.7% (52.5 reported)
CMC Scores:
top-1 78.2% (78.3 reported)
top-5 89.3%
top-10 93.1%
OK. Your group is really rich :)
Hi @liqilei , Thank you for your interest. As in the readme, we use three 2080Ti GPUs for training the model. ReID model is sensitive to the number of GPUs, probably due to BN. I do not have the device to run the code within one GPU, please reproduce our results with three GPUs.
Thanks for your reply. Based on your guidance, I tried on three V100 16 GB GPUs and the reproduced results seems reasonable (see below).
Mean AP: 52.7% (52.5 reported) CMC Scores: top-1 78.2% (78.3 reported) top-5 89.3% top-10 93.1%
hi, please can i further ask if you meet cuda oom issue when training using three gpus (for my case three V100 16GB per). I consistly encounter this using the default batchsize (64) even i notice the torch.cuda.empty_cache()
in the main.py
but seems helpless.
Do you mean 'oom' for out of memory? I use three 2080Ti for training, 11GB memory for each, but I do not encounter that problem. I think three 16GB V100 are enough for batch size 64.
I am not sure but if you do not use Pytorch 1.3.1, maybe you can try to change the PyTorch version?
BTW, how did you run the code two weeks ago? is there anything different?
yes, I mean out of memory. Things are really weird. I just tried to run on three 1080ti, it works.
But when comes to V100 * 3, it failed.
The only difference maybe two weeks ago I was running on x84_64 based platform, while today I use IBM ppc64 (power9 chip). I'm not sure if this is related.
For the PyTorch version, on the IBM system, both 1.3.1 and the latest (1.9) meet oom.
Btw, how does it take for one epoch (64 batchsize) when training on 2080ti * 3?
I am sorry I do not have V100 and IBM machine, and I have never met such a problem. Probably it is due to the platform.
It takes about half an hour for an epoch on three 2080ti.
got it. many thanks for helping.
Hello, I am very interested in your work, then I ran your code and have a few questions to ask you: 1, I see that your code writes itself some network layer, and normal torch.nn network layer is different, network layer parameter can be a tensor instead of parameter, so that can achieve meta-learning. But I found that with meta-learning and without, the training speed varies greatly. That is to say, using meta-learning training becomes very slow. Do I have to write the network layer as buffer? Have you tried directly getting the gradient of mteloss and then writing an optimizer to update the model parameters directly with the gradient of mteloss and grad_info(mtrloss gradient)?
Thanks for reading!