baaivision / Painter

Painter & SegGPT Series: Vision Foundation Models from BAAI
MIT License
2.53k stars 176 forks source link

random coloring scheme in SegGPT #8

Open JunMa11 opened 1 year ago

JunMa11 commented 1 year ago

Thanks for sharing the awesome work!

I do not fully understand the random color scheme in Sec 3.1.

Looking forward to your reply:)

lzl2040 commented 1 year ago

I can not understand this coloring scheme,too.

WXinlong commented 1 year ago

@JunMa11 @lzl2040 Hi,

  1. The difference lies in the "context". In Painter, the context means the samples are in the same task, e.g., depth estimation. For SegGPT, the context is more fine-grained, e.g., the same class, instance, etc.
  2. For a gt segmentation map, we first convert it to a colormap, i.e., different colors for different targets. We randomly sample a part of the targets that are shared between in-context samples. For example, both cats are sampled in two images with the sample color.
  3. It's the same here, i.e., prompt/input image and its colormap, target image and its colormap.
  4. Yes, SegGPT supports more prompt images during training by stitching more samples that share a similar context.

Hope it helps!

JunMa11 commented 1 year ago

Thanks for your reply very much:)

JunMa11 commented 1 year ago

Hi @WXinlong

Thanks again for your kind help. I have some follow-up questions.

  1. For the mix-context training, does mix refer to mixing different images from the same category (e.g., black cat and yellow cat) or mixing different images from various categories (e.g., cat and medical images)?

  2. The model input is two image-mask pairs and the corresponding shapes are 2H*W*3 and 2H*W*3 for the image and target, respectively. The model output is the expected segmentation.

    • 2.1 what is the model output shape? 2H*W*3 or H*W*3
    • 2.2 Since we only have the ground truth (target) of the prompt image, how to initialize the target of the input image?
    • 2.3 SegGPT supports more prompt images during training by stitching more samples. Assume we have 8 prompt images, how to feed them into the pre-trained model? (the model input shapers are 2H*W*3 and 2H*W*3 for the image and target)
  3. For the in-context tuning, could you please explain a little bit about optimize the learnable image tensor?

Looking forward to your reply:)

JunMa11 commented 1 year ago

Hi @WXinlong ,

Any comments are highly appreciated:)

usherbob commented 1 year ago

2. e model input is two image-mask pairs and the corresponding shapes are 2HW3 and 2HW3 for the image and target, respectively. The model

For 2, is it to collect all images containing the instance and randomly choose a color from all these instance ?

中文: 对于SegGPT中怎么改变着色方式很疑惑。是指要把数据集中同一种context(语义分割中的同一类别,实例分割中的同一实例)找出来,在这一种context的所有数据中随机选择其中前景的一种颜色作为最后的着色? @JunMa11 @WXinlong

runjiali-rl commented 10 months ago

I cannot understand how they sample the colors. The expression in the paper is very confusing without code...

SteveImmanuel commented 7 months ago
  1. from what i understood, you can simply replace the prompt using model parameter and optimize it, you can check out my implementation here