amazon-science / patchcore-inspection

Apache License 2.0
719 stars 146 forks source link

(Urgent!!!) Search for the optimal threshold of converting patch core prediction score matrix into binary matrix. #51

Open TheWangYang opened 1 year ago

TheWangYang commented 1 year ago

I want to convert the two-dimensional score matrix predicted by patch_core into the bouding box used in the field of target detection, but first I need to find an appropriate threshold to convert the score matrix into a binary matrix. May I ask how to choose the threshold, or whether you have found an appropriate threshold when doing similar tasks?

The specific code appears as follows:

__load_and_evaluate_patchcore.py__:

line 85 scores, segmentations, labels_gt, masks_gt = PatchCore.predict(
line 86                    dataloaders["testing"]
line 87                )

The above code reflects the PatchCore model prediction results. And the 'PatchCore.predict()' function can be found in patchcore.py as follows:

---begin from line 178---
    def predict(self, data):
        if isinstance(data, torch.utils.data.DataLoader):
            return self._predict_dataloader(data)
        return self._predict(data)

    def _predict_dataloader(self, dataloader):
        """This function provides anomaly scores/maps for full dataloaders."""
        _ = self.forward_modules.eval()

        scores = []
        masks = []
        labels_gt = []
        masks_gt = []
        with tqdm.tqdm(dataloader, desc="Inferring...", leave=False) as data_iterator:
            for image in data_iterator:
                if isinstance(image, dict):
                    labels_gt.extend(image["is_anomaly"].numpy().tolist())
                    masks_gt.extend(image["mask"].numpy().tolist())
                    image = image["image"]
                _scores, _masks = self._predict(image)
                for score, mask in zip(_scores, _masks):
                    scores.append(score)
                    masks.append(mask)
        # Parameters are: Picture score, picture masks, tag ground-truth, and ground-truth picture masks
        return scores, masks, labels_gt, masks_gt
---end with line 202---

And the 'self._predict()' function is defined as follows(also in patchcore.py):

begin from line 204:
    def _predict(self, images):
        """Infer score and mask for a batch of images."""
        images = images.to(torch.float).to(self.device)
        _ = self.forward_modules.eval()

        batchsize = images.shape[0]
        with torch.no_grad():
            features, patch_shapes = self._embed(images, provide_patch_shapes=True)
            features = np.asarray(features)

            patch_scores = image_scores = self.anomaly_scorer.predict([features])[0]
            image_scores = self.patch_maker.unpatch_scores(
                image_scores, batchsize=batchsize
            )
            image_scores = image_scores.reshape(*image_scores.shape[:2], -1)
            image_scores = self.patch_maker.score(image_scores)

            patch_scores = self.patch_maker.unpatch_scores(
                patch_scores, batchsize=batchsize
            )
            scales = patch_shapes[0]
            patch_scores = patch_scores.reshape(batchsize, scales[0], scales[1])
            masks = self.anomaly_segmentor.convert_to_segmentation(patch_scores)
        return [score for score in image_scores], [mask for mask in masks]
end with line 227

And, I just want to convert the 'segmentations' which shows in __load_and_evaluate_patchcore.py__ line 85, 86, 87 into a binary matrix using an appropriate threshold.

The threshold value is denoted as thres. When a value in the score matrix segmentations exceeds thres, it is set to 1. Otherwise, set it to 0;

line 85 scores, segmentations, labels_gt, masks_gt = PatchCore.predict(
line 86                    dataloaders["testing"]
line 87                )

So, So what should I do? I am very eager to get your answers, I would be very grateful!!!

qrmt commented 1 year ago

If you check bin/load_and_evaluate_patchcore.py, you can see how it's done for a complete dataset in a test scenario:

... from line 76 onwards:

          for i, PatchCore in enumerate(PatchCore_list):
              torch.cuda.empty_cache()
              LOGGER.info(
                  "Embedding test data with models ({}/{})".format(
                      i + 1, len(PatchCore_list)
                  )
              )
              scores, segmentations, labels_gt, masks_gt = PatchCore.predict(
                  dataloaders["testing"]
              )
              aggregator["scores"].append(scores)
              aggregator["segmentations"].append(segmentations)

          scores = np.array(aggregator["scores"])
          min_scores = scores.min(axis=-1).reshape(-1, 1)
          max_scores = scores.max(axis=-1).reshape(-1, 1)
          scores = (scores - min_scores) / (max_scores - min_scores)
          scores = np.mean(scores, axis=0)

            segmentations = np.array(aggregator["segmentations"])
            min_scores = (
                segmentations.reshape(len(segmentations), -1)
                .min(axis=-1)
                .reshape(-1, 1, 1, 1)
            )
            max_scores = (
                segmentations.reshape(len(segmentations), -1)
                .max(axis=-1)
                .reshape(-1, 1, 1, 1)
            )
            segmentations = (segmentations - min_scores) / (max_scores - min_scores)
            segmentations = np.mean(segmentations, axis=0)

So the scores for all segmentations are first scaled to be between [0, 1] and later the segmentation maps are similarly scaled to be in between [0, 1]. At this point, you could threshold them, for example, at 0.5, to get binary segmentation maps.

However, this probably won't solve your problem. If you first train your patchcore model and then later want to predict single images, you won't any information to scale the output maps/scores with. One option might be to save the max/min scores from the training set and use them to threshold the output maps (e.g. use maxim train sets unscaled score as threshold for future images, or scale maps so that train set min/max corresponds to [0.0, 0.5] when predicting new images).

A better option would be to optimize the thresold using a validation set that contains both good and bad images, which you may or may not have. This is basically what is done in the code above (which in this case leaks information about the test set! I hope the reported scores in this repo are not based on these values).

If someone has any good ideas on this, I would be glad to hear them as well.

TheWangYang commented 1 year ago

Thanks for your detailed and careful answer, I would like to ask another question, why is the value of the ground-truth mask passed in the evaluation function (that is, the numpy array) not binary, but an array between [0,1]? But isn't there only 0 or 255 (normalized for 0 and 1) when making masks? Looking forward to your reply! @qrmt

TheWangYang commented 1 year ago

Does this mean that I still need to select a threshold for the ground-truth mask for binarization? @qrmt

qrmt commented 1 year ago

I'm not sure what evaluation function you refer to, but if it's patchcore.metrics.compute_pixelwise_retrieval_metrics you pass the predicted segmentation map of type float in range [0..1] as first argument and binarized ground truth masks as second argument. Your GT masks should be in binary form if you have them - no thresholding is necessary.

TheWangYang commented 1 year ago

I am so sorry that I did not describe my problem clearly. The function I refer to is exactly the function you mentioned, but the GT Mask I get is not binary when I pass it in. Is it related to the filter used in the process of making GT Mask? Thank you very much for your timely and patient answer. Looking forward to your reply! @qrmt

flyinghu123 commented 1 year ago
self.transform_mask = [
    transforms.Resize(resize),  # , transforms.InterpolationMode.NEAREST
    transforms.CenterCrop(imagesize),
    transforms.ToTensor(),
]

我觉得你应该看他的mvtec的Dataset里面的mask处理,他没有使用最近邻插值,所以会这样,但是他后面计算指标会进行>0.5阈值操作