MySQL or Pickle for Recognition in Large Dataset

pedromoraesh commented 4 years ago

face_recognition version: 1.3.0
Python version: 3.6
Operating System: Ubuntu 18.04

Thanks for the lib @ageitgey :) Good Work!

Description

Hey Guys, i need to program routines to run at midnight for update face identities in my database. I have about 1M known faces and each time a person enter in my store i save the face encodings to a later recognition.

Two questions:

Which option is better to save the 1M known faces? An encoded pickle file or save in MySQL as separated Columns?
When calculating the Euclidian Distance between 1:1M, which option could use less memory: with SQL or use Enconded.Pickle with pure python code?

I've read the #238 but didn't get any insight about the problem. Because the discussion went to PostgresSQL CUBE extension side.

Ucnt commented 4 years ago

I am playing around with a similar project, but just for fun, and am using MySQL to store all of the picture/face data and doing all of the processing via Python scripts.

Below are some lessons learned (still trying to optimize like you)

If you save it to mysql, you will have to deal with encoding/decoding. i.e. to insert into mysql, I had to insert json.dumps(face_encoding.tolist()) as a text/varchar. Then, reading out, np.array(json.loads(face_encoding.decode("utf-8"))) to get it back to a numpy object that can be operated on.
Loading 1M face encodings will take up ~3G of RAM.
- Estimated via htop from the ~610,000 encodings plus 38 character UUID that I am loading
The most computationally expensive thing you will do is run through all 1M of the faces, looking for a match. That is an order n operation (i.e. on 1M faces, you may have to run through all 1M to find a match or see it's a new face). It may be good to group the encodings into known sets so you only really have to match 1 encoding per set to the new face. Even then, you will still likely have to compare hundreds of thousands of faces.
It's already taking 1.5-2 seconds to match a face to one of 26k sets (i.e. 26K pictures). Extrapolating from that, it could take you >10 seconds to try to match one new face with one of a few hundred thousand pictures. I'm not sure how many new pictures you're adding per day but that might not scale well.

pedromoraesh commented 4 years ago

@Ucnt Thanks for the Reply!

Very good points!

One thing i would like to try is to improve the face recognition accuracy applying the Arc Cos calculation between the Encodings (ArcFace papper). But considering the ArcCos calc is a complex it can apply some penalty in the face recognition time.

I ran some test with LFW dataset (i.e 13k faces), it took 2 seconds running on my laptop with GTX1060.

Last point: Did you try to send those encondings to GPU for speed up face match? Maybe it's an option.

Nice Work with S3 btw, i've been working with too. Need to program some listeners to process new uploads, can i mail you with some questions?

Ucnt commented 4 years ago

I haven't tried GPU matching but it may be somewhat negligible since the large quantity of images is likely the biggest factor.

Thanks and ya, feel free always nice to bounce things off of others.

rathishkumar commented 3 years ago

4. It's already taking 1.5-2 seconds to match a face to one of 26k sets

@Ucnt Thanks for the insights.

I would like to add, after trying for sometime in MySQL, I switched to PostgreSQL - getting response in <100ms for 1:25K encodings using CUBE datatype and indexing.

pedromoraesh commented 3 years ago

I've been using MySQL with ElasticSearch which uses NMSLIB to compare faces, even with Millions and Millions of registers it takes less than a second to compare. I actually have about 20M+ and do it in less than a second.

A local solution can be coded with KNN and BallTrees, using Cosine Distance to compare the faces. It is fast as NMSLIB.

ageitgey / face_recognition

MySQL or Pickle for Recognition in Large Dataset #1168

Description