clovaai / ECLIPSE

(CVPR 2024) ECLIPSE: Efficient Continual Learning in Panoptic Segmentation with Visual Prompt Tuning
Other
32 stars 4 forks source link

Curiosity of ECLIPSE's network architecture #8

Closed wendyhuang2021 closed 1 month ago

wendyhuang2021 commented 1 month ago

Hi! May I ask why you adopted the Mask2Former architecture? Is it because of some disadvantage of Mask2Former in continual learning

qjadud1994 commented 1 month ago

Hi.

There are several reasons why we adopted Mask2Former.

  1. Mask2Former is one of the representative and widely used segmentation models recently.
  2. To apply the visual prompt tuning to continual panoptic segmentation, Mask2former is the most suitable transformer-based architecture.
  3. Mask2Former is powerful and supports universal image segmentation (including panoptic, semantic, and instance segment tasks).
wendyhuang2021 commented 1 month ago

My apologies for my poor English. I did not mean to ask this question. The question I really want to ask is:

Why does ECLIPSE change the Mask2Former architecture? In the original Mask2Former design, there are some connections between the pixel decoder and the transformer. However, in ECLIPSE, these connections from the pixel decoder are removed and changed to image embeddings from the backbone output. May I ask why you changed the Mask2Former architecture? Is it because of some disadvantage of Mask2Former in continual learning?

The above information is taken from pictures of the Mask2Former and ECLIPSE design. If what I have written above is incorrect, please correct me.

qjadud1994 commented 1 month ago

The architectural design of ECLIPSE is totally based on Mask2Former. I guess you misunderstood about our architecture.

image image

wendyhuang2021 commented 1 month ago

Oh, I am more familiar with the image below, but I found they were the same as the code implementation. Thanks for your patient reply!

image