How can I make face recognition faster if I have more than 1M known images?

khaledabbad commented 6 years ago

I tried to use your amazing project for face recognition (for example single unknown image) with my big number of known images (1 Million) but its really slow, this slow because its will load all known images (load_image_file -> then face_encodings) in order to compare with ONE unknown image file.

Any ideas how to speed this process? I was thinking to do face_encodings for all known images then save the 128 as string into apache solr but with no luck as I still need to do compare_faces with all known images:) ... Any suggestions?

ageitgey commented 6 years ago

You could create a database table (postgres, mysql, etc) with 128 columns and store the pre-calculated 1M encodings in that table. Then you could do the compare_faces math using sql against that table to check one face.

khaledabbad commented 6 years ago

Thank you for quick reply, How can I translate your compare_faces since its using np.linalg.norm to any other sql? Do you have any real example?

ageitgey commented 6 years ago

The formula for euclidean distance is just:

So assuming you had one column for each of the 128 feature values, you could do something like:

SELECT * from my_stored_encodings 
ORDER BY 
      sqrt(
         power(e1 - TEST_ENCODING_VALUE_0_HERE, 2) + 
         power(e2 - TEST_ENCODING_VALUE_1_HERE, 2) + 
         power(..... etc.... 
     )

If you are using Postgresql, you can do more complex things like using it's built-in list data types to store the 128-number encoding in one column and do the comparison using a custom stored function. Just google around for "euclidean distance in sql".

khaledabbad commented 6 years ago

Many thanks :)

ramineniraviteja commented 6 years ago

you can use numpy logic to caluclate Euclidian distance between two vectors. dist = numpy.linalg.norm(vecor(a)-vector(b))

khaledabbad commented 6 years ago

Thank you All. I've indexed all images encoding into apache solr and then I managed euclidean distance using solr build-in function dist i.e. http://localhost:8983/solr/mycore/select?q=*:*&fl=dist(2,v_0,v_1,v_3,...,v_127,-0.0621345,0.048437204,0.0839613,...)

So fare, I indexed around 40K images and the query speed very good (17ms without any solr cache)

r04drunn3r commented 6 years ago

@khaledabbad : Could you please elaborate on how you managed to do this? I'm working on something very similar.

khaledabbad commented 6 years ago

I used Apache solr filter query with dist function.

mlanandnair commented 6 years ago

@khaledabbad & @ageitgey : How much accuracy can we expect if we go for euclidean distance calculation rather than knn classifier in case of large set of image data ? ----> https://github.com/ageitgey/face_recognition/blob/master/examples/face_recognition_knn.py As i tested Knn_classifier is much promising in case of accuracy , can we expect that much accuracy. or is this euclidean distance concept is used in knn classifier

xenc0d3r commented 6 years ago

hello @ageitgey
thanks for the great work. I am saving the face_encodings into database in one column. But I am not sure what data type should I select for the row. bigint ? bigserial ? can you help please ?

khaledabbad commented 6 years ago

I don't think its will be a good idea to save all face features (encodings) in one column, try to add 128 columns with float data type i.e. Column0 float(22,20),Column1 float(22,20), ... Column127 float(22,20).

xenc0d3r commented 6 years ago

@khaledabbad but I am using postgresql database. @ageitgey mentioned that we can store encodings in 1 column if we use postgres.

ageitgey commented 6 years ago

@xenc0d3r In postges, you can optionally store them in a single column using the CUBE extension. However postgres has a 100-dimension limit on cube fields by default and you have to edit a file and recompile postgres yourself if you want to do it that way since the face encodings are 128 dimensions.

See the other thread: https://github.com/ageitgey/face_recognition/issues/403#issuecomment-373437405

xenc0d3r commented 6 years ago

@ageitgey thank you for your response. So I decided to save them as multiple columns. I know you are busy with other works but can you give a sql query example how to compare a uploaded image with the saved encodings in python.

Thanks in advance.

ageitgey commented 6 years ago

@xenc0d3r There's an example higher in this thread: https://github.com/ageitgey/face_recognition/issues/238#issuecomment-345847465

You'll just have to write out all 128 terms in the sql statement instead of only the 2 I put in the short example.

xenc0d3r commented 6 years ago

Hello @ageitgey @khaledabbad When I try to save the encoded list to postgresql with python psycopg2 I receive the following error. TypeError: 'numpy.float64' object does not support indexing

I have 128 rows for encoded list elements in my database table and their datatypes are float. Can you help me please ?

Best Regards

xenc0d3r commented 6 years ago

@ageitgey @khaledabbad my code is as :

test = list(encoded-photo) for row in test: c.execute("""INSERT INTO photos VALUES (DEFAULT,%s,%s,%s ---128 times,);""", row)

mmelatti commented 6 years ago

I had success inserting with this: (python code) INSERT INTO test (face_encoding) VALUES ('"+face_encoding_string+"') just changing values to string and inserting into postgres 128 cube encoding. Python: face_encoding_string = "(0.1,0.1,0.1 .... 0.1,0.1)"

For me I don't need to optimize when filling database so this works in my case. I just need optimized querying for when I move towards analyzing video streams for face recognition.

xenc0d3r commented 6 years ago

@mmelatti can you help me about this. can we discuss instantly on twitter etc. thank you

mmelatti commented 6 years ago

Don't have twitter atm sorry, I'll post my code and see if it can help you? I'm using psycopg2 as well. This isn't a final product but you can see if it works for you:

` image = face_recognition.load_image_file(self.filename) face_encoding = face_recognition.face_encodings(image)[0] #only 1 face expected when entering 'mugshot' face_encoding_string = "("

    for distance in face_encoding:
        face_encoding_string+=str(distance)
        face_encoding_string+=str(",")

    face_encoding_string = face_encoding_string[:-1]
    face_encoding_string += str(")")
    print(face_encoding_string) # demo what a face encoding data looks like in the output
    conn = psycopg2.connect(host="localhost",database="postgres", user="postgres", password="password")
    cur = conn.cursor()
    cur.execute("INSERT INTO wanted (first_name, last_name, face_encoding) VALUES ('"+self.entry_1.get()+"', '"+self.entry_2.get()+"', '"+face_encoding_string+"')");
    conn.commit()
    conn.close()`

xenc0d3r commented 6 years ago

@mmelatti thank you so much for your help. regards

xenc0d3r commented 6 years ago

@mmelatti and can you provide the sql query which you are using to compare images ? and also you said you used cude extension. but it has limitation up to 100 as far as I know. May I use it ?

mmelatti commented 6 years ago

Yes its limited up to 100 but in this post I show my method for changing cube to 128: https://github.com/ageitgey/face_recognition/issues/403#issuecomment-374336850

I have link to download the source for postgres and I have instructions for changing cube data type to 128 dimensions. I'm working on my query atm I'll share that code as soon as I finish it.

It'll basically look like this: SELECT c FROM test ORDER BY c <-> cube(array[0.5,0.5,0.5]) LIMIT 1; See: https://www.postgresql.org/docs/10/static/cube.html

UPDATE: Finished my query, same method for finding face encoding in new picture. then I query that against my database. Just finished. Now I need to stress test/ also test accuracy / and test thresholds aka if there isn't a face that closely resembles one in database we shouldn't return anything (unknown face).

Code: (Python3) ` conn = psycopg2.connect(host="localhost",database="postgres", user="postgres", password="password")

cur = conn.cursor()

tempstring = "SELECT first_name FROM wanted ORDER BY face_encoding <-> cube(array["+face_encoding_string+"]) LIMIT 1"

cur.execute(tempstring)

print(cur.fetchall()) `

xenc0d3r commented 6 years ago

@mmelatti thank you. I am trying to install it.

xenc0d3r commented 6 years ago

@mmelatti I modified and installed the 10.3 version. But I think I can not start server with service postgresql start command. Everything tangled here.. :)

mmelatti commented 6 years ago

did you add new user $ adduser postgres did you switch to that user to issue start command?

I'm on mac, if it doesn't work exactly for you I'd follow the README provided in the download for installation.

You can also try this docker installation: https://github.com/oelmekki/postgres-350d

FATAL: role "faceadmin" does not exist Seems role was not created correctly. Try using steps in README?

xenc0d3r commented 6 years ago

@mmelatti I receive FATAL: role "faceadmin" does not exist

xenc0d3r commented 6 years ago

@mmelatti Ok i achieved finally. it is woking

xenc0d3r commented 6 years ago

@mmelatti can you also send the code where you instered the encoded array into database. and what data type did you set for it ? thank you

xenc0d3r commented 6 years ago

@mmelatti I inserted succesfully into database but when I try to compare them I got the following error. LINE 1: ...first_name FROM wanted ORDER BY face_encoding <-> cube(array... ^ HINT: No function matches the given name and argument types. You might need to add explicit type casts.

any ideas ?

mmelatti commented 6 years ago

I had that same error for a while. It went away when I changed ORDER By ____ to make sure it was the same column that I'm storing the cube data type. Might help you to review GiST index and metric operators defined in the cube data type 10.3 page

I believe the issue is with your <-> metric operator?

Also double check and make sure you have at least 2 entries in your table, I've seen that as an issue for people

Did you just make install cube, or did you make install all extensions. I ended up make installing all extensions. not sure if this could be the root of your problem.

I don't believe you should remove "ORDER BY"

xenc0d3r commented 6 years ago

@mmelatti so should I remove order by part. Can you help please. it does not recognize cube command

xenc0d3r commented 6 years ago

@mmelatti ARE you storing the encoding in 128 different rows or in a single row ?

mmelatti commented 6 years ago

1 column, (cube data type), entered as 128 dimensions: looks like this when entering: (0.1,0.1,0.1, ...., 0.1,0.1). A row contains data entry for a single person, first and last name, id, and their face encoding (face encoding fits in one column) aka it fits into one cell in my table.

Thats the whole point of changing the datatype and specifically using Postgresql instead of just a typical database set up with 128 columns for the 128 face encodings. By doing it this way, there is less to do application side. Also I think my tables look nicer.

I also am going to experiment with many-to-one relationship with multiple face encodings (pictures) stored in my "wanted list" for a single person and see what types of results I get. (in case someone has more than one "mugshot" in our database.

Still need to experiment with thresholds as well for "unknown faces". I'll be adding my example code and detailed setup later.

xenc0d3r commented 6 years ago

I receive an error while encoding. Any ideas ? @ageitgey @khaledabbad

File "/usr/local/lib/python3.6/dist-packages/face_recognition/api.py", line 197, in face_encodings raw_landmarks = _raw_face_landmarks(face_image, known_face_locations, model="small") File "/usr/local/lib/python3.6/dist-packages/face_recognition/api.py", line 151, in _raw_face_landmarks face_locations = _raw_face_locations(face_image) File "/usr/local/lib/python3.6/dist-packages/face_recognition/api.py", line 100, in _raw_face_locations return face_detector(img, number_of_times_to_upsample) MemoryError: std::bad_alloc

badory commented 6 years ago

Hello @mmelatti

I would like to thank you first for sharing the database for face recognition.

I had a problem when running LoadDatabase.py and I would appreciate if you please help to solve it.

$ python LoadDatabase1.py /Users/Badr/.virtualenvs/cv/lib/python2.7/site-packages/psycopg2/init.py:144: UserWarning: The psycopg2 wheel package will be renamed from release 2.8; in order to keep installing from binary please use "pip install psycopg2-binary" instead. For details see: http://initd.org/psycopg/docs/install.html#binary-install-from-pypi. """) Exception in Tkinter callback Traceback (most recent call last): File "/usr/local/Cellar/python/2.7.14/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-tk/Tkinter.py", line 1541, in call return self.func(*args) File "LoadDatabase1.py", line 81, in SubmitDataBase image = face_recognition.load_image_file(self.filename) AttributeError: LoadDatabase instance has no attribute 'filename'

mmelatti commented 6 years ago

@badory happy to help. First of all, if you're getting Tkinter errors, Tkinter is just used for the GUI. I also included a command line interface that does pretty much the same thing. So if you get Tkinter errors I would try using the CLI in /command_line_interface folder. as for the psycopg2 error, that library is required for connecting to the PostgreSQL database. It seems like you're having problems with the wheel package. Have you tried steps to install this? sudo apt-get update, sudo apt-get pip2 or pip3 (whichever python version). pip install --upgrade pip ; pip install psycopg2 ; make sure its in your python virtual environment. maybe try uninstalling and reinstalling? I'm not sure if you're using python's virtualenv or what version of python you're using. I believe I was using python 2.7.

online this seems to be the solution for updating psycopg2: sudo apt-get install pip3 sudo apt-get install libpq-dev sudo pip3 install psycopg2

badory commented 6 years ago

@mmelatti Thanks for the suggestions. I've already used the CLI that you provided and successfully load pic to the database and recognize the pic too. However, I still receives error when loading the picture and using load.py as follow: Traceback (most recent call last): File "load_db.py", line 40, in self.SetDefault() NameError: name 'self' is not defined

I also just realized that GUI also works properly if I don't use the default picture and choose another picture. I installed psycopg2 and create new environment with python3 installed. Yet I receive an error as follow: Exception in Tkinter callback Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/tkinter/init.py", line 1702, in call return self.func(*args) File "LoadDatabase1py3.py", line 98, in SubmitDataBase self.SetDefault() File "LoadDatabase1py3.py", line 65, in SetDefault self.entry_3.delete(0, Tkinter.END) NameError: name 'Tkinter' is not defined

It looks like it gets confused between Tkinter(for python2) and tkinter (for Python3). In another word, it runs Tkinter instead of tkinter as im using python3 which I think it caused the problem.

Thank you again.,

kaisark commented 6 years ago

@ageitgey - Did you read about the FB class action suit and revised Facebook privacy terms to come out where users opt-in to face recognition features (e.g. auto-tagging) ??? I think it had something to do with storing biometric (face) data...

https://techcrunch.com/2018/04/16/judge-says-class-action-suit-against-facebook-over-facial-recognition-can-go-forward/

MLDSBigGuy commented 6 years ago

@mmelatti, Thanks for your code :) are u suing knn search anywhere in database ? Why not store ML models of the encodings in database than encodings ? isnt the search time less with models ?

mmelatti commented 6 years ago

@MLDSBigGuy I haven't tested the database with models instead of encodings, however the PostgreSQL database is a spatial database that I assume optimized for the cube data type. I haven't personally tested a database with 50+ million entries, but I believe other people have commented in issues in this face_rec repo testing databases of that scale and did not mention search time as a major issue. I think when you have that many faces in the database you run into other issues. These issues are more related to accuracy of the API that uses trained data that is not incredibly diverse. Unless you're Google or Facebook its hard to get millions of tagged images. I believe this API training used IMDB (Stanford study) which is has proven to be most accurate with white faces. Furthermore you're dealing with pixels not vector images, so details data are lost. I believe you will see best results using same resolution photos when comparing. Please anybody correct me if I'm wrong I may be confusing this with another API I was reading about.

MLDSBigGuy commented 6 years ago

Hi @mmelatti, Thank you for your reply,

I started with MongoDb to store the encodings. I pickled the numpy arrays as one and stored in to mongo document. Am planning to write the euclidean distance in application code after loading all the arrays fom database. Am i doing it correctly ? Any ideas ? Why didnot you simply pickle the encoding ? And load all the encodings to application code for comparision ?

mmelatti commented 6 years ago

For very large numbers The database searching an encoding may be faster than the application code for comparison (Application code may just loop through entries because we can't hash these encodings for lookup because we need closest match. need confirmation on this?)? I'm not as familiar with the specific details of the API. Additionally, haven't tried but I'm not sure on the management side, how to remove entries from application code so +/- management?

MLDSBigGuy commented 6 years ago

@mmelatti thank you :) Could you please tell the sql code for how you calculated the euclidean distance in postgre ? I see that usually query operations doesnot include a direct distance calculation feature. In mongo, it has array push pop and other minimal operations but not a direct distance calculation query. I just want to learn from your code to do the same with mongo.

Thank you,

mmelatti commented 6 years ago

pretty sure example of how sql should look is located right at the top of this thread: (ageitgey posted)

SELECT * from my_stored_encodings ORDER BY sqrt( power(e1 - TEST_ENCODING_VALUE_0_HERE, 2) + power(e2 - TEST_ENCODING_VALUE_1_HERE, 2) + power(..... etc.... )

MLDSBigGuy commented 6 years ago

@mmelatti thank you very much 👍

How can you improve the accuracy of images if you store in database ? We need to store more images of same person ?

Imagine i store 10 images of same person. Then accuracy could be improved ?

Regarding the speed, if i have 50,000 different faces. can my response still be in milli seconds if i check with each result in postgres database ? (Doesnot matter even if i get wrong matches)

I see that postgres implemented knn indexing within cube <-> eulidean distance operator. Is this same as trianing a knn machine learning model ?

Thank you,

mmelatti commented 6 years ago

@MLDSBigGuy You might want to check out this website: https://hackernoon.com/building-a-facial-recognition-pipeline-with-deep-learning-in-tensorflow-66e7645015b8 convolution neural networks for face recognition.

You should still be able to get fast queries from a spatial database even with many entries in the db. The neural network converts feature maps to vector space (128 embedding) in vector space we can use vector distance to determine similarity in identifications. I don't believe the knn indexing is the same as the api. besides the obvious advantageous use of persistent data I would take a look at some key differences here: https://pdfs.semanticscholar.org/237b/77328f6f1c75ba4fcdca131b0d95f6bb54b3.pdf https://en.wikipedia.org/wiki/Spatial_database https://dl.acm.org/citation.cfm?id=280279&dl=ACM&coll=DL

I don't believe storing 10 images of the same person in the database will improve accuracy. If done for all persons you would be blurring the identity in the vector space and also seems counter-intuitive to the convolution neural network except in special cases. I believe the best approach to improve accuracy without somehow creating a better more accurate convolution neural network would be to: not increase stored images of same person, but instead analyze more frames of a person in real-time surveillance . aka if you are doing video surveillance you can look at multiple frames of a person walking by and determine the person through multiple frames and angles. The industry seems to be taking this approach for increased accuracy.

best,

MLDSBigGuy commented 6 years ago

Thank you very much @mmelatti for the links and detailed answer.

Yes, you are right. matching the person walking through video surveillance is good 👍

I see in your postgres setup, there is no indexing on encodings. Wont query be much faster with indexing ?

If so, can you please tell me if i need to create index when i create table in starting (or) when i query entries ?

Thank you,

Asad2195 commented 5 years ago

How can I use face_recognition to load multiple images? please guide me, I am new to python.. I've asked it in detail on stackoverflow, please see and please guide me: https://stackoverflow.com/questions/53042959/dynamically-store-images-via-face-recognition

mmelatti commented 5 years ago

@Asad2195 ageitgey/face_recognition provides exactly that functionality. See The examples in the Example from project: (identify_and_draw_boxes_on_faces.py)

import face_recognition
from PIL import Image, ImageDraw

# This is an example of running face recognition on a single image
# and drawing a box around each person that was identified.

# Load a sample picture and learn how to recognize it.
obama_image = face_recognition.load_image_file("obama.jpg")
obama_face_encoding = face_recognition.face_encodings(obama_image)[0]

# Load a second sample picture and learn how to recognize it.
biden_image = face_recognition.load_image_file("biden.jpg")
biden_face_encoding = face_recognition.face_encodings(biden_image)[0]

# Create arrays of known face encodings and their names
known_face_encodings = [
    obama_face_encoding,
    biden_face_encoding
]
known_face_names = [
    "Barack Obama",
    "Joe Biden"
]

This loads faces. This thread specifically follows how to store face encodings in a spatial database for querying likeness. I have a public repo that deals with postgresql spactial DB.

Also check out ageitgey knn examples

ageitgey / face_recognition

How can I make face recognition faster if I have more than 1M known images? #238