brj0 / nndescent

C++/Python implementation of Nearest Neighbor Descent for efficient approximate nearest neighbor search
BSD 2-Clause "Simplified" License
21 stars 2 forks source link

Memory efficiency and issue with AMD CPU #1

Open roman-bushuiev opened 7 months ago

roman-bushuiev commented 7 months ago

Hello! I would like to try out your C++ implementation of NNDescent because the PyNNDescent implementation does not fit into 1.5 TB of memory (my data matrix is 70,000,000 x 1024). Could you advise me if your implementation is more memory-efficient? From the README, I found that it should be faster, but what about memory usage?

Also, I found that the Usage example works fine on an Intel CPU (Intel Xeon Processor (Skylake, IBRS)), but it crashes on an AMD CPU (AMD EPYC 7543 32-Core Processor) with the following error:

>>> import numpy as np
>>> import nndescent
>>> data = np.random.randint(50, size=(20,3)).astype(np.float32)
>>> nnd = nndescent.NNDescent(data, n_neighbors=4)
Illegal instruction

Do you know what could be the issue? Thank you in advance!

brj0 commented 7 months ago

I haven't had the opportunity to test the algorithm on such large datasets, and it hasn't been optimized for memory consumption. Therefore, it's likely that the memory usage will be similar as PyNNDescent. As for the error you encountered on the AMD CPU, the implementation was developed and tested on Intel chips, so I'm unable to diagnose the specific issue.

roman-bushuiev commented 7 months ago

Ok, thank you for the information.