facebookresearch / deepcluster

Deep Clustering for Unsupervised Learning of Visual Features
Other
1.69k stars 325 forks source link

PCA error on own dataset #40

Closed juliangaal closed 5 years ago

juliangaal commented 5 years ago

I'm running into this error with my own dataset:

RuntimeError: Error in void faiss::PCAMatrix::prepare_Ab() at VectorTransform.cpp:482: Error: 'd_out * d_in <= PCAMat.size()' failed: PCA matrix cannot output 256 dimensions from 4096

My environment setup matches your defined dependencies (except cuda10, which may become an issue...?).

Parameters I tested, but resulted in the same errror:

Thanks for your help!

mathildecaron31 commented 5 years ago

Hi, thanks for your interest ! Looking at the error it seems that you don't have enough data points in your dataset. Indeed I manage to reproduce your error when I use less than 256 points. Have a look at faiss pca code: if n the number of points is higher or equal than the points dimension, d_in, then PCAMat is of dim d_in * d_in and the condition is True. However, if n < d_in (which I assume is your case) then n needs to be higher than d_out for the condition to be True.

In a nutshell, it will throw the error if n < d_out, in your case it seems that your dataset size is less than 256.

It is a pure faiss error, so if I am not able to solve your issue you might post on faiss directly.

Hope it helps

juliangaal commented 5 years ago

Ah, great to hear, thanks a lot. Yes, my first test set is VERY small to just see if the whole pipeline works. Cheers

skasapis commented 4 years ago

Hey, I had the same problem with a dataset of 100 pictures. In the code I am using the function sets the pca variable to be 256. You can drop this to lets say 50, and you will solve your problem as I did!

def preprocess_features(npdata, pca=256): #change this to 50 mat = faiss.PCAMatrix (ndim, pca, eigen_power=-0.5) mat.train(npdata)

BEFOURD commented 2 years ago

you must let the number of your dataset n % batch_size == 0 or n % batch_size >= d_out