facebookresearch / dinov2

PyTorch code and models for the DINOv2 self-supervised learning method.

Apache License 2.0

8.89k stars 783 forks source link

the computation of the standard array for foreground/background separation. #220

Open AHammoudeh opened 1 year ago

AHammoudeh commented 1 year ago

Regarding the code below from the features notebook, the standard array, which was used to separate foreground/background, was loaded from a numpy file saved on the web. Can someone please advise how that standard array can be computed? It consists of a vector of 768 numbers.

Precomputed foreground / background projection

STANDARD_ARRAY_URL = "https://dl.fbaipublicfiles.com/dinov2/arrays/standard.npy" standard_array = load_array_from_url(STANDARD_ARRAY_URL)

zhaoc5 commented 1 year ago

alejandroHdzV commented 1 year ago

Any updates on how to compute the precomputed foreground / background projection?

ccl-private commented 8 months ago

tabsun commented 2 months ago

any update on this question?

tomasssterns commented 2 months ago

crypdick commented 2 months ago

I'm not one of the authors but I assume that they took the PCA of a bunch of images and then took the vector of the first component (pca.components_[0]). I'd spot check if this vector points towards the background objects, and if so, multiply it by -1 to make it point towards the foreground objects so that the thresholding works as expected.

Having this pre-computed vector is nice because 1) it makes a cleaner foreground/background separation, and 2) it's cheaper to take a dot product of your patch embedding than to do two rounds of training PCA.

tabsun commented 2 months ago

I'm not one of the authors but I assume that they took the PCA of a bunch of images and then took the vector of the first component (pca.components_[0]). I'd spot check if this vector points towards the background objects, and if so, multiply it by -1 to make it point towards the foreground objects so that the thresholding works as expected.

Having this pre-computed vector is nice because 1) it makes a cleaner foreground/background separation, and 2) it's cheaper to take a dot product of your patch embedding than to do two rounds of training PCA.

I think this is right. And the images selected to calculate the vector(PCA analysis) will actually determine the mask.