Closed derneuere closed 3 years ago
@akshay9 Maybe that's an enhancement you can implement? 😄
Let me see during the weekend what I can do 🙂
@akshay9 I tried to debug the problem. The main problem is this function: https://github.com/LibrePhotos/librephotos/blob/73a7032a18d25cc7f0e6f4ea7da18c9d50d8e09a/api/image_similarity.py#L58-L80 We load potentially all pictures into RAM. I will patch it to load at most 2.5k images for now.
I think we have to find out how to incrementally build the index and then check again how much memory Faiss needs and if we can improve the overhead.
Thats great, let me know if that works, I'll still implement the Lower Memory footprint model of Faiss, it might help with RPi server.
To-Do: Add a loop to send batches to faiss. If we only index 2500 images, then the semantic search will also only search within these images.
250k images should take about = 250000 512(emb size) 4bytes(fp32) = ~500mb model Ram Usage
If someone has a lot of pictures scanned (~250k) then the memory usage of faiss can be larger than the 8GB RAM recommended while building the index.
The team at faiss already wrote some documentation: https://github.com/facebookresearch/faiss/wiki/Lower-memory-footprint
I find the docs kind of dense, but maybe someone with a background in machine learning will understand how to apply it.