Excus me, in your code,how are attention scores calculated between different patches of the same image in the attention part of your network structure? My understanding is that during the image preprocessing stage, you segmented the image into 128x128 patches, and when loading the image, the shape is (8, 1, 128, 128). Does this mean that only one patch is loaded? Then, using the OverlapPatchEmbed function, did you change the number of channels of this patch to 64 and subsequently calculate attention scores between channels?
Excus me, in your code,how are attention scores calculated between different patches of the same image in the attention part of your network structure? My understanding is that during the image preprocessing stage, you segmented the image into 128x128 patches, and when loading the image, the shape is (8, 1, 128, 128). Does this mean that only one patch is loaded? Then, using the OverlapPatchEmbed function, did you change the number of channels of this patch to 64 and subsequently calculate attention scores between channels?