caoyunkang / Segment-Any-Anomaly

Official implementation of "Segment Any Anomaly without Training via Hybrid Prompt Regularization (SAA+)".
704 stars 76 forks source link

How to support unseen types of defections? #21

Closed changtimwu closed 10 months ago

changtimwu commented 1 year ago

by observing the prompts, I found SAA/SAA+ are quite limited to known types of defections.

Here is an example. SAA/SAA+ has no concept of "stripped screws".

image_path = 'assets/screws.png'
textual_prompts = ['dirty. stripped. spot. ', 'screw'] # detect prompts, filtered phrase
property_text_prompts = 'the image of screw have 4 similar screw, with a maximum of 2 anomaly. The anomaly would not exceed 1. object area. '

image image

caoyunkang commented 1 year ago

We appreciate your attention to SAA+.

The capabilities of SAA+ are constrained by the underlying foundation models, namely Grounding DINO and SAM. Due to the training data used for these existing foundation models, they might encounter challenges when it comes to accommodating infrequently encountered concepts without proper training, while SAA+ aims at adapting existing foundation models to downstream tasks without any training. Based on empirical experiments, it has been observed that SAA+ demonstrates a more proficient ability to identify structural defects, as opposed to semantic (contextual) defects.

Should you require further clarification or information, please feel free to inquire.

changtimwu commented 9 months ago

Hi! I'm currently reading the paper at https://arxiv.org/abs/2311.00871. I noticed that the abstract contains statements that effectively explain why SAA+ is unable to detect unseen anomalies.

Our empirical results show transformers demonstrate near-optimal unsupervised model selection capabilities, in their ability to first in-context identify different task families and in-context learn within them when the task families are well-represented in their pretraining data. However when presented with tasks or functions which are out-of-domain of their pretraining data, we demonstrate various failure modes of transformers and degradation of their generalization for even simple extrapolation tasks.