Open martinenkoEduard opened 2 years ago
CompreFace stores all vectors in PostgresSQL. When you first use recognition service, it caches them on API nodes and then uses euclidian distance to get the similarities. So yes, it is not supposed to be used with millions of images. Here is the discussion about it: https://github.com/exadel-inc/CompreFace/issues/776
What enginie does it use to search vectors? Does add new nodes help get more results?
Does it neet to rescan all vectors after a restart?
What is the maximum amount of faces that one achieved on this system?
I'm not sure that I get all your questions correctly. We use Nd4j library to find a euclidian distance When you add more nodes, you can achieve more requests per second, not more face examples. It doesn't need to rescan all vectors after a restart, as it saves them in DB We successfully use 50 000 faces in production. I think 100k-200k should work OK.
I mean where are vectors itself are stored? Are they stored in ram in Nd4J or do you use some kind of index search engine (like FAISS or Milvus) to search by euclidian distance? Or do you use postgress database to search by euclidian distance? (does it support it)?
For permanent storage they are stored in Postgres, then it's cached in RAM. Then we use Nd4J for calculating euclidian distance. We do not use any search engine.
What database/index search engine does CompreFace use? Where vectors are stored? Does it need to reload/recompute all the vectors after a restart? How much memory does each vector use?
Does it use FAISS or something similar for index search? Or just plain database? How does it search similar vectors? Is it horizontally scalable of all vectors should be on one computer?