Closed Arclabs001 closed 7 years ago
The PCA and OPQ stuff (vector preprocessors) are only implemented on the CPU at present. However, on that script they should only be training on 1000000 vectors, so it shouldn't take too too long. Are you compiling with optimizations? How many CPU cores are on your machine? Also, the 1000000 was chosen for the original 1 billion, you could try decreasing the preprocessing training size to 100000 or so.
As for the second error, the comment in the assertion is wrong, it should be "unsupported with not using precomputed codes". This particular case hasn't been instantiated in the code, and I haven't yet written a generic-sized (but slower) fallback.
To handle 20 dims per code, add a case 20:
here:
and a case 20:
here
recompile and I think it should work. I'll fix the typo and add the 20 dim case on my end.
Thanks. Now I reduced the dimension as 256d and it works fine. Thanks for your help
Hi, I have randomly generated a 10M * 400D array to index & search. But when I try to train the preproc with OPQ20_80 and
train_preprocessor()
exactly same as bench_gpu_1bn.py, the first time I train it, it merely costs 602s. However, when I trained it again, it took more than an hour without finishing and I stopped it. I don't know how long to keep me waiting.So, can I train it on GPU to accelerate the process?
Another issue is, when I create an IndexIVFPQ index without preproc (i.e., 400D vectors and PQ20), the program returns a error like this:
Is there anything wrong with the dimensions? As 400 % 20 should be zero.
My runtime environment is K40*4.
Thanks.
The codes are here: Codes of Problem 1 Codes of Problem 2