CASIA-IVA-Lab / AnomalyGPT

[AAAI 2024 Oral] AnomalyGPT: Detecting Industrial Anomalies Using Large Vision-Language Models
https://anomalygpt.github.io
Other
807 stars 100 forks source link

Why do we need text and LLM? #9

Open tophus00 opened 1 year ago

tophus00 commented 1 year ago

Hello! I have read your code and found that the anomaly map in test_mvtec.py is entirely based on calculating cosine similarity with the few-shot normal samples. The calculation of Image AUC and Pixel AUC is also based on this anomaly map. It seems that this can already achieve anomaly detection, why do we still employ text encoder and LLama?

FantasticGNU commented 1 year ago
  1. Existing industrial anomaly detection methods (only based on anomaly map) can only provide anomaly scores for query samples, and need to manually set the score threshold for distinguishing normal and abnormal samples. This threshold is very different for each type of item, and can only be obtained through experiments when there are many normal and abnormal labeled samples. The figure below shows the accuracy of PatchCore using different thresholds in each category of items in the MVTec dataset (This figure is also in the supplementary material of our paper). When performing few-shot or zero-shot inference on a new category of objects, after the existing model provides an anomaly score, the user actually does not know whether the item is normal or abnormal, because the user does not know the score threshold for this class of items. However, our method can directly give judgment results for samples provided by users through a large language model, which is more practical than existing methods.

  2. In addition to judging whether there is anomaly and pointing out the location of the anomaly, our method can also provide information about the content of the test image itself, which to a certain extent provides the basis for the judgment of the model.

264530281-8c1b3855-4745-4e49-914a-469d1b5d126d