random coloring scheme in SegGPT

JunMa11 commented 1 year ago

Thanks for sharing the awesome work!

I do not fully understand the random color scheme in Sec 3.1.

randomly sampling another image that shares a similar context with the input image; Q1: this step is the same as Painter, right?
randomly sample a set of colors from the target image and map each color to a random one. Q2: what do you mean by sampling colors from the target image? Q3: How to map the color to the randomly sampled image?
two pairs of images, which are defined as an in-context pair; Q4: In Painter, I know the two pairs are prompt /input image and its mask, target image and its mask. but what are the two pairs here?
we introduce the mix-context training method which trains the model using mixed examples. This involves stitching together multiple images with the same color mapping. The resulting image is then randomly cropped and resized to form a mixed-context training sample. Q5: compared to Painter, do you stitch more than two prompt images in SegGPT?

Looking forward to your reply:)

lzl2040 commented 1 year ago

I can not understand this coloring scheme,too.

WXinlong commented 1 year ago

@JunMa11 @lzl2040 Hi,

The difference lies in the "context". In Painter, the context means the samples are in the same task, e.g., depth estimation. For SegGPT, the context is more fine-grained, e.g., the same class, instance, etc.
For a gt segmentation map, we first convert it to a colormap, i.e., different colors for different targets. We randomly sample a part of the targets that are shared between in-context samples. For example, both cats are sampled in two images with the sample color.
It's the same here, i.e., prompt/input image and its colormap, target image and its colormap.
Yes, SegGPT supports more prompt images during training by stitching more samples that share a similar context.

Hope it helps!

JunMa11 commented 1 year ago

Thanks for your reply very much:)

JunMa11 commented 1 year ago

Hi @WXinlong

Thanks again for your kind help. I have some follow-up questions.

For the mix-context training, does mix refer to mixing different images from the same category (e.g., black cat and yellow cat) or mixing different images from various categories (e.g., cat and medical images)?
The model input is two image-mask pairs and the corresponding shapes are 2H*W*3 and 2H*W*3 for the image and target, respectively. The model output is the expected segmentation.
- 2.1 what is the model output shape? 2H*W*3 or H*W*3
- 2.2 Since we only have the ground truth (target) of the prompt image, how to initialize the target of the input image?
- 2.3 SegGPT supports more prompt images during training by stitching more samples. Assume we have 8 prompt images, how to feed them into the pre-trained model? (the model input shapers are 2H*W*3 and 2H*W*3 for the image and target)
For the in-context tuning, could you please explain a little bit about optimize the learnable image tensor?

Looking forward to your reply:)

JunMa11 commented 1 year ago

Hi @WXinlong ,

Any comments are highly appreciated:)

usherbob commented 1 year ago

2. e model input is two image-mask pairs and the corresponding shapes are 2HW3 and 2HW3 for the image and target, respectively. The model

For 2, is it to collect all images containing the instance and randomly choose a color from all these instance ?

中文：对于SegGPT中怎么改变着色方式很疑惑。是指要把数据集中同一种context（语义分割中的同一类别，实例分割中的同一实例）找出来，在这一种context的所有数据中随机选择其中前景的一种颜色作为最后的着色？ @JunMa11 @WXinlong

runjiali-rl commented 10 months ago

I cannot understand how they sample the colors. The expression in the paper is very confusing without code...

SteveImmanuel commented 7 months ago

from what i understood, you can simply replace the prompt using model parameter and optimize it, you can check out my implementation here

baaivision / Painter