LibrePhotos / librephotos

A self-hosted open source photo management service. This is the repository of the backend.
MIT License
7.03k stars 309 forks source link

Implement faiss with low memory footprint #336

Closed derneuere closed 3 years ago

derneuere commented 3 years ago

If someone has a lot of pictures scanned (~250k) then the memory usage of faiss can be larger than the 8GB RAM recommended while building the index.

The team at faiss already wrote some documentation: https://github.com/facebookresearch/faiss/wiki/Lower-memory-footprint

I find the docs kind of dense, but maybe someone with a background in machine learning will understand how to apply it.

derneuere commented 3 years ago

@akshay9 Maybe that's an enhancement you can implement? 😄

akshay9 commented 3 years ago

Let me see during the weekend what I can do 🙂

derneuere commented 3 years ago

@akshay9 I tried to debug the problem. The main problem is this function: https://github.com/LibrePhotos/librephotos/blob/73a7032a18d25cc7f0e6f4ea7da18c9d50d8e09a/api/image_similarity.py#L58-L80 We load potentially all pictures into RAM. I will patch it to load at most 2.5k images for now.

I think we have to find out how to incrementally build the index and then check again how much memory Faiss needs and if we can improve the overhead.

akshay9 commented 3 years ago

Thats great, let me know if that works, I'll still implement the Lower Memory footprint model of Faiss, it might help with RPi server.

derneuere commented 3 years ago

To-Do: Add a loop to send batches to faiss. If we only index 2500 images, then the semantic search will also only search within these images.

250k images should take about = 250000 512(emb size) 4bytes(fp32) = ~500mb model Ram Usage