amazon-science / patchcore-inspection

Apache License 2.0
691 stars 142 forks source link

BUG) wrong gpu assignment #61

Open YoojLee opened 1 year ago

YoojLee commented 1 year ago

Bug description

The latest version of code on the main branch seems to have a bug regarding a GPU assignment in the nearest neighbor search (Faiss) process at the inference.

I tested on two environments:

Whatever GPU I assign for the code, at the inference (nearest neighbor search for anomaly score), GPU 0 unexpectedly participated in the search. This might lead to computational inefficiency.

Reproduction of the failure case

I reproduced the error as the below figure presents (I assigned GPU 1 for the code, but at the inference, GPU 0 is unexpectedly involved in the process):

image → When computing features from training images, GPU 0 is idle, and no data is assigned to the device. It seems like GPU 1 is the only participant in the anomaly detection procedure.

image → However, as an inference begins, data is assigned to GPU 0, and the nearest neighbor is computed on GPU 0 (memory assigned and utilization is non-zero).

This bug is caused by line 46 from common.py, as described below:

class FaissNN(object):
    ...
    def _create_index(self, dimension):
        if self.on_gpu:
            return faiss.GpuIndexFlatL2(
                faiss.StandardGpuResources(), dimension, faiss.GpuIndexFlatConfig()
            )
        return faiss.IndexFlatL2(dimension)

Suggestion

In the original code, you use the default configuration for FaissGPUIndexFlatL2, where the default device for GPU is set to 0. I modded the code to assign the expected device to the configuration (details are provided in the pull request I made)

After the fix:

image image → The issue has been solved as above

image → I confirmed there is no adverse effect on the performance as well.