arthurdouillard / CVPR2021_PLOP

Official code of CVPR 2021's PLOP: Learning without Forgetting for Continual Semantic Segmentation
https://arxiv.org/abs/2011.11390
MIT License
145 stars 23 forks source link

Clarification regarding domain shift experiments on Cityscapes #24

Closed prachigarg23 closed 2 years ago

prachigarg23 commented 2 years ago

Hi @arthurdouillard, I really enjoyed reading your work! Thanks for bringing in the domain shift aspect of CSS. I have the following doubts in the implementation of ILT, MiB and PLOP for the domain shift experiments on Cityscapes (Table 5):

  1. wrt PLOP: I'm assuming the pseudo labeling will not be applicable in these experiments as the label spaces are fixed in the domain incremental scenario. So do I just use the distillation loss along with regular cross-entropy? Is my understanding correct wrt using PLOP in a domain IL scenario?
  2. MiB modifies distillation and cross-entropy to tackle the background class shift issue. Since there is no such issue in the domain incremental scenario, doesn't their method get reduced to ILT (basically LwF)? I'm confused as to why there is a difference in the performance (For e.g. 59% for ILT and 61.5% for MiB in the 11-5 case).

Also, is it possible to share the joint model (traditional segmentation model) mIoU you get for Cityscapes on DeeplabV3, ResNet101? (I couldn't find this in the paper and wanted to see the drop wrt the joint one).

arthurdouillard commented 2 years ago

Thank you for your interest in my work :)

  1. Yes, you're right.
  2. ILT has the KD (which is == as the kd of MiB in this very experiment) but also a MSE between features. see https://github.com/arthurdouillard/CVPR2021_PLOP/blob/4b4a0605b2afb84f988d246dee9a00e0c7517363/argparser.py#L34

I don't think I ever run the joint model on Cityscapes, you're right that it could be useful. If I find some spare GPUs, I'll run this exp.

prachigarg23 commented 2 years ago

Thank you for the prompt reply!

Actually I'm working on CSS on Cityscapes. I want to compare the drop in performance wrt base model against your method. Let me know if possible.

arthurdouillard commented 2 years ago

Hey,

So I didn't re-run anything new as I didn't have time for it, but I found some results:

First of all in my follow up paper (https://arxiv.org/abs/2106.15287) I used for cityscapes the resolution 512x1024 while for the original paper (https://arxiv.org/abs/2011.11390) I've used 512x512 (which makes less sense because images are originally rectangle not square).

So with 512x1024, with 50 epochs, I've got around 58.06. So compared this results with my second paper (https://arxiv.org/abs/2106.15287). This is not super high and we could definitly be better but I kept the same training schedule used by all models so simplicity.

While not comparable to Cityscapes' results in PLOP, does that answer your question?

prachigarg23 commented 2 years ago

Hi, thanks for getting back. Yeah actually I was trying to reproduce the 77% mIoU performance on Cityscapes as I need that for my experiments. I'm currently getting 70% and asked for PLOP's result to see incase it was 75%+. But I understand it depends on the learning schedule used so I'm trying to use the DeeplabV3 paper's hyperparameters. Thanks for your help!

prachigarg23 commented 2 years ago

Hi, @arthurdouillard I have a small doubt. In the LwF and ILT experiments, loss_kd and loss_de have been set to 100, which I believe is the regularization factor for the soft cross entropy loss in the total loss. But in the LwF and ILT (ICCVW 2019) papers I saw that this loss balance weight is set to 1 not 100. Is there a reason for this? I was wondering if you could help resolve this confusion.

arthurdouillard commented 2 years ago

For the baselines (like LwF and ILT) all hyperparameters (except number of epochs) are from Cermelli et al.'s MiB. I didn't tune them as Cermelli already tune them for segmentation (although not the same dataset I agree).