Closed Howeng98 closed 12 months ago
Hi, I'm glad you're interested in our work.
Image_features
contain the global information of an image, which is the sum of all the local detailed information. To perform anomaly segmentation, specific detailed information at each position in the image is often required, which _cannot be directly reflected through image_features
alone_.
Therefore, I believe that it is not possible to obtain anomaly maps solely through image_features
and text_features
. However, image_features
do contain certain valuable information. You can consider how to use this information in addition, instead of completely discarding patch_tokens
.
Hi, thanks for contributing nice work. Here I have a question for discussion.
Question: How can we use image_feature (in your train.py line 112) instead of patch_tokens with ResNet50 backbone. And do you have any suggestions on how to achieve this?
In the original code (with ResNet50 backbone), you are using different scale patch_tokens to element-wise multiply text_feature with shape: (B, 9612, 768) and (B, 768, 2) => (B, 9612, 2) (B, 2304, 768) and (B, 768, 2) => (B, 2304, 2) (B, 576, 768) and (B, 768, 2) => (B, 576, 2) and reshape, interpolate to target anomaly map size, and so on...
But the image_features shape is (B, 768) and the text_features shape is (B, 768, 2). How should we modify and design the rest actions to continue to train linear layers and generate anomaly maps for inference?
If you have any questions, feel free to ask, thanksss!