Open Masaaki-75 opened 1 year ago
have the same confusion
I'll share my personal understanding, please point out any mistakes if there are any. _patchembed is just a conv layer, whose both stride and kernel_size is _patchsize. In this way, only info within a patch is seen by the conv kernel each time, info does not exchange between patches.
Hello! I notice in your code that the model's input remains consistent during training and inference, i.e., paired images
imgs
, paired labelstgts
, and maskbool_masked_pos
. Duringforward()
, the model can see the information of test labels before the labels get masked (withpatch_embed
), see the screenshot below:This is acceptable during masked image modeling training, but what about during inference (test image has no label)?
I mean, since the information of test labels should not be seen by Painter, does the paired labels
tgts
already have all 0 values on those pixels belonging to test labels? or you have other preprocessing strategy fortgts
during inference?I draw a sketch and hopefully this would make myself clearer: