HeliosZhao / M3L

Pytorch implementation for M^3L. CVPR 2021
65 stars 21 forks source link

a few questions #1

Closed WentaoTan closed 3 years ago

WentaoTan commented 3 years ago

Hello, I am very interested in your work, then I ran your code and have a few questions to ask you: 1, I see that your code writes itself some network layer, and normal torch.nn network layer is different, network layer parameter can be a tensor instead of parameter, so that can achieve meta-learning. But I found that with meta-learning and without, the training speed varies greatly. That is to say, using meta-learning training becomes very slow. Do I have to write the network layer as buffer? Have you tried directly getting the gradient of mteloss and then writing an optimizer to update the model parameters directly with the gradient of mteloss and grad_info(mtrloss gradient)?

  1. When I run your code, there will be a feature fusion process in the meta-test phase, The fusion feature comes from Norm. Sample, where I found that running would report an error:"the parameter scale has invalid values", and the larger the learning rate, the more likely this error would occur. Have you ever met one?
  2. I also found that I deleted the code for feature fusion due to an error in 2, I found that running out a much higher than in your article, such as MS + C + D → M reached mAP = 52.1%; At the same time, I also found that this time without meta-learning can achieve mAP = 52.0%

Thanks for reading!

liqilei commented 3 years ago

Hi, please can I ask the detail of your reproduce environment?

I tried with a single A100 (40G) with PyTorch of 1.8.0 and cuda of 11.1.1. But the performance is not as good as reported.

e.g.

# MSMT17_V1+Market1501+CUHK_NP --> DukeMTMC-reID, checkpoint.pth.tar
Mean AP: 43.6%  (48.8 reported)
CMC Scores:
  top-1          62.2% (67.2 reported)
  top-5          76.4%
  top-10         81.4%

Thanks

HeliosZhao commented 3 years ago

Hi @Vincent-TwT ,

Thanks for your interest and sorry about the late reply.

For your first question, utilizing meta-learning indeed greatly increases the computational cost. This is because we need to retain the graph for the meta-train model and calculate the high order gradients of meta-test loss. The process of updating parameters has a slight influence on the training time.

For your second question, I have never seen an error like that. It is better to provide the full traceback.

For your third question, I think you use the CUHK-NP as the source domain? We provide the results of using MSMT17_V1 and CUHK-NP as the source domains in our latest version on arxiv, e.g., MS + C + D --> M reaches 52.5 mAP. For our ablation studies, we use the CUHK03 (old protocol) and MSMT17_V2 as the source domains. We aggregate all the source domains together without domain labels to train a model as the baseline. Our meta-learning strategy includes the balanced sampling of source domains and the high-order gradients. With ResNet-50 backbone, the baseline achieves 41.1 mAP, baseline+our meta-sampling achieves 46.4 mAP, baseline+meta-learning strategy achieves 47.4 mAP and our full model achieves 48.1 mAP. We found that the high-order gradients have less influence with IBN-Net because instance normalization reduces the domain gaps.

HeliosZhao commented 3 years ago

Hi @liqilei ,

Thank you for your interest.

As in the readme, we use three 2080Ti GPUs for training the model. ReID model is sensitive to the number of GPUs, probably due to BN. I do not have the device to run the code within one GPU, please reproduce our results with three GPUs.

liqilei commented 3 years ago

Hi @liqilei ,

Thank you for your interest.

As in the readme, we use three 2080Ti GPUs for training the model. ReID model is sensitive to the number of GPUs, probably due to BN. I do not have the device to run the code within one GPU, please reproduce our results with three GPUs.

Thanks for your reply. Based on your guidance, I tried on three V100 16 GB GPUs and the reproduced results seems reasonable (see below).

Mean AP: 52.7%  (52.5 reported)                                                                                                                                                                   
CMC Scores:                                                                                                                                                                       
  top-1          78.2%   (78.3 reported)                                                                                                                                                         
  top-5          89.3%                                                                                                                                                     
  top-10         93.1%
HeliosZhao commented 3 years ago

OK. Your group is really rich :)

liqilei commented 3 years ago

Hi @liqilei , Thank you for your interest. As in the readme, we use three 2080Ti GPUs for training the model. ReID model is sensitive to the number of GPUs, probably due to BN. I do not have the device to run the code within one GPU, please reproduce our results with three GPUs.

Thanks for your reply. Based on your guidance, I tried on three V100 16 GB GPUs and the reproduced results seems reasonable (see below).

Mean AP: 52.7%  (52.5 reported)                                                                                                                                                                   
CMC Scores:                                                                                                                                                                       
  top-1          78.2%   (78.3 reported)                                                                                                                                                         
  top-5          89.3%                                                                                                                                                     
  top-10         93.1%

hi, please can i further ask if you meet cuda oom issue when training using three gpus (for my case three V100 16GB per). I consistly encounter this using the default batchsize (64) even i notice the torch.cuda.empty_cache() in the main.py but seems helpless.

HeliosZhao commented 3 years ago

Do you mean 'oom' for out of memory? I use three 2080Ti for training, 11GB memory for each, but I do not encounter that problem. I think three 16GB V100 are enough for batch size 64.

I am not sure but if you do not use Pytorch 1.3.1, maybe you can try to change the PyTorch version?

BTW, how did you run the code two weeks ago? is there anything different?

liqilei commented 3 years ago

yes, I mean out of memory. Things are really weird. I just tried to run on three 1080ti, it works.

But when comes to V100 * 3, it failed.

The only difference maybe two weeks ago I was running on x84_64 based platform, while today I use IBM ppc64 (power9 chip). I'm not sure if this is related.

For the PyTorch version, on the IBM system, both 1.3.1 and the latest (1.9) meet oom.

Btw, how does it take for one epoch (64 batchsize) when training on 2080ti * 3?

HeliosZhao commented 3 years ago

I am sorry I do not have V100 and IBM machine, and I have never met such a problem. Probably it is due to the platform.

It takes about half an hour for an epoch on three 2080ti.

liqilei commented 3 years ago

got it. many thanks for helping.