baaivision / Painter

Painter & SegGPT Series: Vision Foundation Models from BAAI
MIT License
2.53k stars 176 forks source link

Questions on Painter's inference without groundtruths #30

Open Masaaki-75 opened 1 year ago

Masaaki-75 commented 1 year ago

Hello! I notice in your code that the model's input remains consistent during training and inference, i.e., paired images imgs, paired labels tgts, and mask bool_masked_pos. During forward(), the model can see the information of test labels before the labels get masked (with patch_embed), see the screenshot below: image

This is acceptable during masked image modeling training, but what about during inference (test image has no label)?

I mean, since the information of test labels should not be seen by Painter, does the paired labels tgts already have all 0 values on those pixels belonging to test labels? or you have other preprocessing strategy for tgts during inference?

I draw a sketch and hopefully this would make myself clearer:

Q%RGMO8_)NBJ9OQ~_RBJVVS

kssscrl commented 1 year ago

have the same confusion

hhd52859 commented 1 year ago

I'll share my personal understanding, please point out any mistakes if there are any. _patchembed is just a conv layer, whose both stride and kernel_size is _patchsize. In this way, only info within a patch is seen by the conv kernel each time, info does not exchange between patches.