What database/index search engine does CompreFace use? Where vectors are stored? Does it need to reload all the vectors after a restart? How much memory does each vector use?

exadel-inc / CompreFace

Leading free and open-source face recognition system

https://exadel.com/accelerator-showcase/compreface/

Apache License 2.0

5.71k stars 775 forks source link

What database/index search engine does CompreFace use? Where vectors are stored? Does it need to reload all the vectors after a restart? How much memory does each vector use? #804

Open martinenkoEduard opened 2 years ago

martinenkoEduard commented 2 years ago

What database/index search engine does CompreFace use? Where vectors are stored? Does it need to reload/recompute all the vectors after a restart? How much memory does each vector use?

Does it use FAISS or something similar for index search? Or just plain database? How does it search similar vectors? Is it horizontally scalable of all vectors should be on one computer?

pospielov commented 2 years ago

CompreFace stores all vectors in PostgresSQL. When you first use recognition service, it caches them on API nodes and then uses euclidian distance to get the similarities. So yes, it is not supposed to be used with millions of images. Here is the discussion about it: https://github.com/exadel-inc/CompreFace/issues/776

martinenkoEduard commented 2 years ago

What enginie does it use to search vectors? Does add new nodes help get more results?

Does it neet to rescan all vectors after a restart?

What is the maximum amount of faces that one achieved on this system?

pospielov commented 2 years ago

I'm not sure that I get all your questions correctly. We use Nd4j library to find a euclidian distance When you add more nodes, you can achieve more requests per second, not more face examples. It doesn't need to rescan all vectors after a restart, as it saves them in DB We successfully use 50 000 faces in production. I think 100k-200k should work OK.

martinenkoEduard commented 2 years ago

I mean where are vectors itself are stored? Are they stored in ram in Nd4J or do you use some kind of index search engine (like FAISS or Milvus) to search by euclidian distance? Or do you use postgress database to search by euclidian distance? (does it support it)?

pospielov commented 2 years ago

For permanent storage they are stored in Postgres, then it's cached in RAM. Then we use Nd4J for calculating euclidian distance. We do not use any search engine.