amazon-science / patchcore-inspection

Apache License 2.0
719 stars 146 forks source link

Difference of projection process between paper and this implementation #21

Open JefferyChiang opened 2 years ago

JefferyChiang commented 2 years ago

Thanks for the implementation!

This implementation using torch.nn.Linear as the random linear projection function. And rather using Johnson–Lindenstrauss theorem which is stated in the paper to check the input size, it sets the dimension_to_project_features_to default to 128. The problem it that it will still transfer the feature size to 128 even if the size is smaller which I think it is not necessary. Is there any study for comparing the result between whether using the theorem or why set the output size default to 128?

Confusezius commented 2 years ago

So the Johnson-Lindenstrauss Theorem is used to argue with the validity of random projections to lower dimensional representation spaces, which we do through torch.nn.Linear. In all our cases, the sizes of the input dimensions where always bigger than 128, as such we never did check for situations where that may not be the case. In particular, https://github.com/amazon-research/patchcore-inspection/blob/b64be4734cb8295bfbadccf4f6a036b266181e57/bin/run_patchcore.py#L245-L246 denote which internal dimensionality each feature vector is projected through via adaptive average pooling before being down-projected for the coreset generation. As such, even for lower feature sizes no error should be thrown.

lorenzomammana commented 1 year ago

Good day, first of all thank you for your work as it's truly great.

I come as an anomalib library user on which they perform dimensionality reduction during coreset sub sampling using the Johnson–Lindenstrauss lemma to select the lowest possible reduced dimension.

I was wondering why is the dimensionality reduction applied only during the subsampling but then the memory bank contains the full features? Applying the concepts of the lemma would it be possible to have a faster approximation of the nearest neighbour algorithm by applying a projection to both the memory bank and then the feature in inference?

If we have a very large embedding dimension I've found cases on which that actually improves the performance compared to using a full dimension memory bank. While generally worsen out on small embedding dimension

@Confusezius Do you think that could be a correct approach?

Confusezius commented 1 year ago

Yes, one could also certainly downsample memory entry dimensionalities and benefit from even further reduced inference times, especially if you have seen experimental benefits in doing so! I do want to highlight though that the primary motivation behind the use of coresets is to retain as much information as possible for each entry, but instead remove redundant embeddings to reduce the overall search space :).