MohamedTEV / DACA

Detect, Augment, Compose, and Adapt: Four Steps for Unsupervised Domain Adaptation in Object Detection
9 stars 2 forks source link

mAP reproducibility ( Sim10k2Cityscapes ) #6

Open manan-d8 opened 5 months ago

manan-d8 commented 5 months ago

Hello @MohamedTEV,

I am trying to reproduce the results. I did the pretraining for 20 epochs for the sim10k2cityscapes experiment and got the mAP 0.4912 for the last epoch. Now my question is that when I start the DACA adaptation training, I am getting the following mAPs. Is this the usual trend?( Training is still going on )

  1. Epoch 1: 36.90% mAP
  2. Epoch 2: 38.40% mAP
  3. Epoch 3: 33.08% mAP...

image

Also, could you please confirm that each epoch is taking like 15 to 16 mins? Is it normal? ( I am using Nvidia v100 GPUs )

Thanks, Manan

MohamedTEV commented 5 months ago

Dear @manan-d8 Note that the adaptation process in DACA was achieved with 50 epochs (20 epochs are set only for the source-only pretraining of the model). The image you attached shows 50 epochs though, which is correct. For Epoch 3, you did not get 33.08% mAP, but 3.38% mAP instead, which indicates that there is some errors.

Looking at the number of images that were loaded in your case (2824), it seems that you have issues with the data you are passing to the model, either the images or the labels or both. The number of images you are supposed to get is 2975. I would recommend checking the YAML files and the paths therein if they are correct. Also, check the number of classes you are specifying inside the YAML files. Take your time to explore your data because the error is definitely there, In my opinion, since you are using Cityscapes as a target dataset here, you should use only the label file corresponding to the Car class only. However, when you address the citty2foggy scenario, you can use the labels corresponding to 8 classes. I think you are using the 8 class labels in the sim10k2cityscapes, double check on this.

I am attaching a screenshot of my training on V100, it is supposed to take around 30 to 40 min per epoch (this also depends on CPU, RAM). However, note that I am using only one GPU.

I hope this helps. Best

image
manan-d8 commented 4 months ago

Hello @MohamedTEV,

Firstly, Thank you so much for your detailed response. I figured that I was ignoring the Images that don't have the labels, so I had fewer images in the target domain, which I added back, and now I have the same 2975 images.

So, now my question is, when working with the KITTI2Cityscapes experiment, when I ignore the missing labels in cityscapes train & val, results are normal till now( training in progress ), but when I use missing labels, there are many sudden drops in the mAP, and there is lots of variance in the training, so while working with KITTI have you observed similar thing?

Best regards, @manan-d8

MohamedTEV commented 4 months ago

Hi @manan-d8

I am happy you could address the previous issue. You're welcome ! Regarding the second question, I did not explore that aspect, but I believe it is not that important. I would suggest to keep the same setting as I detailed before and carry on that way to have a comparative setup to DACA

Best Mohamed