Why do we need text and LLM?

Existing industrial anomaly detection methods (only based on anomaly map) can only provide anomaly scores for query samples, and need to manually set the score threshold for distinguishing normal and abnormal samples. This threshold is very different for each type of item, and can only be obtained through experiments when there are many normal and abnormal labeled samples. The figure below shows the accuracy of PatchCore using different thresholds in each category of items in the MVTec dataset (This figure is also in the supplementary material of our paper). When performing few-shot or zero-shot inference on a new category of objects, after the existing model provides an anomaly score, the user actually does not know whether the item is normal or abnormal, because the user does not know the score threshold for this class of items. However, our method can directly give judgment results for samples provided by users through a large language model, which is more practical than existing methods.
In addition to judging whether there is anomaly and pointing out the location of the anomaly, our method can also provide information about the content of the test image itself, which to a certain extent provides the basis for the judgment of the model.

264530281-8c1b3855-4745-4e49-914a-469d1b5d126d

CASIA-IVA-Lab / AnomalyGPT