face_recognition_knn.py question

hanckmail commented 6 years ago

Operating System: debian stretch

i want to make a database, that contains about 5.000 of images. I use face_recognition_knn.py and it works fine but, how can i organize it, for make recognition faster. For example i import 1 photo, it search about 30 minutes from 5.000 of images, after that im importing second photo and it also takes 30 minutes. Can i 'index' that 5000 photos or something like that. all 5.000 photos will constantly stay in one folder. I didnt actually understood how can i do model retraining.

p.s. sorry for my english

FiveMaster commented 6 years ago

your mean that recognition waste 30 minutes per photo? it should not be so slow, can you show your recognition code or your all code. I will help you to check this issue. you also can try mutliple process, it may make your recognition faster.

hanckmail commented 6 years ago

No 30 minutes per 5000 photos, but i think my GPU not involved in process of recognition.

hanckmail commented 6 years ago

Sorry i think i found what i needed

import face_recognition
import pickle

all_face_encodings = {}

img1 = face_recognition.load_image_file("obama.jpg")
all_face_encodings["obama"] = face_recognition.face_encodings(img1)[0]

img2 = face_recognition.load_image_file("biden.jpg")
all_face_encodings["biden"] = face_recognition.face_encodings(img2)[0]

# ... etc ...

with open('dataset_faces.dat', 'wb') as f:
    pickle.dump(all_face_encodings, f)

# Load face encodings
with open('dataset_faces.dat', 'rb') as f:
    all_face_encodings = pickle.load(f)

# Grab the list of names and the list of encodings
face_names = list(all_face_encodings.keys())
face_encodings = np.array(list(all_face_encodings.values()))

# Try comparing an unknown image
unknown_image = face_recognition.load_image_file("obama_small.jpg")
unknown_face = face_recognition.face_encodings(unknown_image)
result = face_recognition.compare_faces(face_encodings, unknown_face)

# Print the result as a list of names with True/False
names_with_result = list(zip(face_names, result))
print(names_with_result)

I just need some help, how can i import whole folder, with about 3000 photos named as 1,2,3,4 etc, and i need only 'True' results not whole list or 'True' faces to be opened

or how can i import my dataset instead of train folder in face_recognition_knn.py or instead of folder people_i_know in face_recognition, Please anybody help(

p.s. in future if i add more files how can i update "dataset_faces.dat" without deleting it , i have to change "wb" ?

hanckmail commented 6 years ago

can anybody give an example how can i train model in face_recognition_knn.py model_save_path = "" - must be it a path, or a file name please guys 1 example of full working script trainig and prediction, i didnt understand how to make it works

FiveMaster commented 6 years ago

You just need to keep the training results in a txt file at the first time. Like this: knn_clf = train("knn_examples/train", "./model/model.txt") and then preddict like this: preds = predict(join("knn_examples/test", img_path), model_save_path="./model/model.txt") If the training data remains the same, you don't need to training model.

hanckmail commented 6 years ago

thank you very very much bro, now everything works perfectly, i have 3 more questions: 1 - how many faces can be stored in txt model 2 - if i add more faces i need to train everything from begining, or there is a command to continue training. 3 - if i have 2 or 3 people that look like each other in result i will have only one of them, how can i get all of them for example result in a txt file?

MLDSBigGuy commented 6 years ago

1-As many as u want to train

2- I think, With current code, we have to train again. Would be helpful if someone can comment on this point. Do we need to train whole dataset(ex: 10,000 images) when new observation gets added up ? Something like incremental/batch learning algorithms with nearest neighbours ?

3-I think u need to play with kneighbours

hanckmail commented 6 years ago

i all most got it my friend , i added the string "print(closest_distances)" closest_distances = knn_clf.kneighbors(faces_encodings, n_neighbors=8) print(closest_distances)

it prints 8 closest neighbors but not their name only distance ((array([[0. , 0.38515912, 0.81486565, 0.8384682 , 0.8384682 , 0.86994276, 0.93531323, 0.93531323]]), array([[3, 7, 4, 2, 6, 0, 1, 5]]))) like this, please help me to add names )

MLDSBigGuy commented 6 years ago

closest_distances = knn_clf.kneighbors(faces_encodings, n_neighbors=1)

is_recognized = [closest_distances[0][i][0] <= DIST_THRESH for i in range(len(X_faces_loc))]

return [(basename, pred) if rec else (basename, "No match found") for pred, rec in zip(knn_clf.predict(faces_encodings), is_recognized)]

You need to modify these 3 lines of code. In, closest_distances[0][i], the first 0 is the array([[0. , 0.38515912, 0.81486565, 0.8384682 , 0.8384682 , 0.86994276, 0.93531323, 0.93531323]]) In, closest_distances[1][i], it is array([[3, 7, 4, 2, 6, 0, 1, 5]]))....

So, keep a for loop and get all the nearet neighbours to is_recognized variable. Later, for all those recogized values, print the names. Printing the names is done at 3rd line when is_recognized value is true

BTW, does ur 2nd point got clarification ?

hanckmail commented 6 years ago

sorry, im newbie i didnt understand anything ((
as i understand 0.38515912 - is distance of second most similar face, my problem is in printing his name. i just want to see

, 0.38515912, 0.81486565, 0.8384682 , 0.8384682 , 0.86994276, 0.93531323, 0.93531323

their names instead of distance.

Sorry for my stupidity.

MLDSBigGuy commented 6 years ago

Can u print your is_recognized and tell me ? (the next line of closest_distances)

If ur result is just: [True] or getting error like "The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()", Try with numpy arrays to store boolean values.

Once, we get the boolean values, we can later work on printing names. Let me know if you are able to print all the 8 nearest boolean values.

hanckmail commented 6 years ago

I divided standard face_recognition_knn.py into two parts: 1st to train dataset once:

`from math import sqrt from sklearn import neighbors from os import listdir from os.path import isdir, join, isfile, splitext import pickle from PIL import Image, ImageFont, ImageDraw, ImageEnhance import face_recognition from face_recognition import face_locations from face_recognition.cli import image_files_in_folder

ALLOWED_EXTENSIONS = {'png', 'jpg', 'jpeg'}

def train(train_dir, model_save_path = "", n_neighbors = None, knn_algo = 'ball_tree', verbose=False): """ Trains a k-nearest neighbors classifier for face recognition.

:param train_dir: directory that contains a sub-directory for each known person, with its name.

 (View in source code to see train_dir example tree structure)

 Structure:
    <train_dir>/
    ├── <person1>/
    │   ├── <somename1>.jpeg
    │   ├── <somename2>.jpeg
    │   ├── ...
    ├── <person2>/
    │   ├── <somename1>.jpeg
    │   └── <somename2>.jpeg
    └── ...
:param model_save_path: (optional) path to save model of disk
:param n_neighbors: (optional) number of neighbors to weigh in classification. Chosen automatically if not specified.
:param knn_algo: (optional) underlying data structure to support knn.default is ball_tree
:param verbose: verbosity of training
:return: returns knn classifier that was trained on the given data.
"""
X = []
y = []
for class_dir in listdir(train_dir):
    if not isdir(join(train_dir, class_dir)):
        continue
    for img_path in image_files_in_folder(join(train_dir, class_dir)):
        image = face_recognition.load_image_file(img_path)
        faces_bboxes = face_locations(image)
        if len(faces_bboxes) != 1:
            if verbose:
                print("image {} not fit for training: {}".format(img_path, "didn't find a face" if len(faces_bboxes) < 1 else "found more than one face"))
            continue
        X.append(face_recognition.face_encodings(image, known_face_locations=faces_bboxes)[0])
        y.append(class_dir)

if n_neighbors is None:
    n_neighbors = int(round(sqrt(len(X))))
    if verbose:
        print("Chose n_neighbors automatically as:", n_neighbors)

knn_clf = neighbors.KNeighborsClassifier(n_neighbors=n_neighbors, algorithm=knn_algo, weights='distance')
knn_clf.fit(X, y)

if model_save_path != "":
    with open(model_save_path, 'wb') as f:
        pickle.dump(knn_clf, f)
return knn_clf

if name == "main": knn_clf = train("knn_examples/train", "knn_examples/train/model/model.txt")`

And 2nd to get result:

`from math import sqrt from sklearn import neighbors from os import listdir from os.path import isdir, join, isfile, splitext import pickle from PIL import Image, ImageFont, ImageDraw, ImageEnhance import face_recognition from face_recognition import face_locations from face_recognition.cli import image_files_in_folder

ALLOWED_EXTENSIONS = {'png', 'jpg', 'jpeg'}

def predict(X_img_path, knn_clf = None, model_save_path ="knn_examples/train/model/model.txt", DIST_THRESH = .45): """ recognizes faces in given image, based on a trained knn classifier

:param X_img_path: path to image to be recognized
:param knn_clf: (optional) a knn classifier object. if not specified, model_save_path must be specified.
:param model_save_path: (optional) path to a pickled knn classifier. if not specified, model_save_path must be knn_clf.
:param DIST_THRESH: (optional) distance threshold in knn classification. the larger it is, the more chance of misclassifying an unknown person to a known one.
:return: a list of names and face locations for the recognized faces in the image: [(name, bounding box), ...].
    For faces of unrecognized persons, the name 'N/A' will be passed.
"""

if not isfile(X_img_path) or splitext(X_img_path)[1][1:] not in ALLOWED_EXTENSIONS:
    raise Exception("invalid image path: {}".format(X_img_path))

if knn_clf is None and model_save_path == "":
    raise Exception("must supply knn classifier either thourgh knn_clf or model_save_path")

if knn_clf is None:
    with open(model_save_path, 'rb') as f:
        knn_clf = pickle.load(f)

X_img = face_recognition.load_image_file(X_img_path)
X_faces_loc = face_locations(X_img)
if len(X_faces_loc) == 0:
    return []

faces_encodings = face_recognition.face_encodings(X_img, known_face_locations=X_faces_loc)

closest_distances = knn_clf.kneighbors(faces_encodings, n_neighbors=1)

is_recognized = [closest_distances[0][i][0] <= DIST_THRESH for i in range(len(X_faces_loc))]

# predict classes and cull classifications that are not with high confidence
return [(pred, loc) if rec else ("N/A", loc) for pred, loc, rec in zip(knn_clf.predict(faces_encodings), X_faces_loc, is_recognized)]

def draw_preds(img_path, preds): """ shows the face recognition results visually.

:param img_path: path to image to be recognized
:param preds: results of the predict function
:return:
"""
source_img = Image.open(img_path).convert("RGBA")
draw = ImageDraw.Draw(source_img)
for pred in preds:
    loc = pred[1]
    name = pred[0]
    # (top, right, bottom, left) => (left,top,right,bottom)
    draw.rectangle(((loc[3], loc[0]), (loc[1],loc[2])), outline="red")
    draw.text((loc[3], loc[0] - 30), name, font=ImageFont.truetype('Pillow/Tests/fonts/FreeMono.ttf', 30))
source_img.show()

if name == "main":

for img_path in listdir("knn_examples/test"):
    preds = predict(join("knn_examples/test", img_path), model_save_path="knn_examples/train/model/model.txt")
    print(preds)
    draw_preds(join("knn_examples/test", img_path), preds)`

Everything works well but am afraid if there will be similar faces i can get wrong result , and wont see some possible results, which can be true

hanckmail commented 6 years ago

My is_recognized result is [True]

MLDSBigGuy commented 6 years ago

Make your is_recogized to store all the boolean values for all 8 neighbours.

Currently we are not able to print the nearest neighbour names because our is_recognized is set in such a way that it stores only one neigbour.

Thats why only one [True] value is out. If we make is_recognized to store all the nearest neighbours boolean values, we could then try to print the names of all those is_recognized neighbouring values.

hanckmail commented 6 years ago

How can I do it? Can you help

MLDSBigGuy commented 6 years ago

yes, the below line prints all neighbours bool values

is_recognized = [(closest_distances[0][i] <= DIST_THRESH) for i in range(len(X_faces_loc))] print(is_recognized[0])

I kept n_neighbours as 2, So it prints out for me as [ True True]

Later, i am thinking how to parse through this is_recognized and relate it to for pred, rec in zip(knn_clf.predict(faces_encodings), is_recognized)

pred is the name of the person.

we are almost close. what do u think ?

I am sorry if this is the wrong way to print names. I am also new to this library. But this should work

hanckmail commented 6 years ago

is_recognized = [(closest_distances[0][i] <= DIST_THRESH) for i in range(len(X_faces_loc))] print(is_recognized[0]) gives the result^

(array([[0.34390288, 0.41492873]]), array([[3, 7]])) <generator object predict.. at 0x7f259b5a21a8> [('Obama', (247, 1146, 632, 760))]

MLDSBigGuy commented 6 years ago

Bro, i got it! its really simple. we did too much.

knn_clf.kneighbors return two variables. Just change code like below,

closest_distances, indices = knn_clf.kneighbors(faces_encodings, n_neighbors=2)

Now we have link to the names of the original trianed data through indices :)

Ref: http://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html#sklearn.neighbors.KNeighborsClassifier.kneighbors

hanckmail commented 6 years ago

can you print whole code? im going crazy editing it million times)

Now we have link to the names of the original trianed data through indices :) - how can we print names nothing changed in my result

MLDSBigGuy commented 6 years ago

Actually i customized too many parts of it. It leads to more confusion. Can you perhaps print your indices ? It would be something like [1,2,3...]

Link this indices with training y data like:

for i in indices[0]:
        print(y[i])#This gives the names of the classdir.

Optional: If you want to print the actual image file name..just change this in train function: y.append(os.path.splitext(os.path.basename(img_path))[0])

hanckmail commented 6 years ago

what are indices when i print them in resuslt i see - [[3, 7]] i didnt train anything yet, only examples in knn_examples folder, + copy of them - 8 folders

MLDSBigGuy commented 6 years ago

Copy below code and tell me what happens

import os
from math import sqrt
from sklearn import neighbors
from os import listdir
from os.path import isdir, join, isfile, splitext
import pickle
from PIL import Image, ImageFont, ImageDraw, ImageEnhance
import face_recognition
from face_recognition import face_locations
from face_recognition.cli import image_files_in_folder
X = []
y = []

ALLOWED_EXTENSIONS = {'png', 'jpg', 'jpeg'}

def train(train_dir, model_save_path = "", n_neighbors = None, knn_algo = 'ball_tree', verbose=False):
    """
    Trains a k-nearest neighbors classifier for face recognition.
    :param train_dir: directory that contains a sub-directory for each known person, with its name.
     (View in source code to see train_dir example tree structure)
     Structure:
        <train_dir>/
        ├── <person1>/
        │   ├── <somename1>.jpeg
        │   ├── <somename2>.jpeg
        │   ├── ...
        ├── <person2>/
        │   ├── <somename1>.jpeg
        │   └── <somename2>.jpeg
        └── ...
    :param model_save_path: (optional) path to save model of disk
    :param n_neighbors: (optional) number of neighbors to weigh in classification. Chosen automatically if not specified.
    :param knn_algo: (optional) underlying data structure to support knn.default is ball_tree
    :param verbose: verbosity of training
    :return: returns knn classifier that was trained on the given data.
    """

    for class_dir in listdir(train_dir):
        if not isdir(join(train_dir, class_dir)):
            continue
        for img_path in image_files_in_folder(join(train_dir, class_dir)):
            image = face_recognition.load_image_file(img_path)
            faces_bboxes = face_locations(image)
            if len(faces_bboxes) != 1:
                if verbose:
                    print("image {} not fit for training: {}".format(img_path, "didn't find a face" if len(faces_bboxes) < 1 else "found more than one face"))
                continue
            X.append(face_recognition.face_encodings(image, known_face_locations=faces_bboxes)[0])
            y.append(os.path.splitext(os.path.basename(img_path))[0])

    if n_neighbors is None:
        n_neighbors = int(round(sqrt(len(X))))
        if verbose:
            print("Chose n_neighbors automatically as:", n_neighbors)

    knn_clf = neighbors.KNeighborsClassifier(n_neighbors=n_neighbors, algorithm=knn_algo, weights='distance')
    knn_clf.fit(X, y)

    if model_save_path != "":
        with open(model_save_path, 'wb') as f:
            pickle.dump(knn_clf, f)
    return knn_clf

def predict(X_img_path, knn_clf = None, model_save_path ="", DIST_THRESH = .5):
    """
    recognizes faces in given image, based on a trained knn classifier
    :param X_img_path: path to image to be recognized
    :param knn_clf: (optional) a knn classifier object. if not specified, model_save_path must be specified.
    :param model_save_path: (optional) path to a pickled knn classifier. if not specified, model_save_path must be knn_clf.
    :param DIST_THRESH: (optional) distance threshold in knn classification. the larger it is, the more chance of misclassifying an unknown person to a known one.
    :return: a list of names and face locations for the recognized faces in the image: [(name, bounding box), ...].
        For faces of unrecognized persons, the name 'N/A' will be passed.
    """

    if not isfile(X_img_path) or splitext(X_img_path)[1][1:] not in ALLOWED_EXTENSIONS:
        raise Exception("invalid image path: {}".format(X_img_path))

    if knn_clf is None and model_save_path == "":
        raise Exception("must supply knn classifier either thourgh knn_clf or model_save_path")

    if knn_clf is None:
        with open(model_save_path, 'rb') as f:
            knn_clf = pickle.load(f)

    X_img = face_recognition.load_image_file(X_img_path)
    X_faces_loc = face_locations(X_img)
    if len(X_faces_loc) == 0:
        return []

    faces_encodings = face_recognition.face_encodings(X_img, known_face_locations=X_faces_loc)

    closest_distances,indices = knn_clf.kneighbors(faces_encodings, n_neighbors=1)

    #is_recognized = [closest_distances[0][i][0] <= DIST_THRESH for i in range(len(X_faces_loc))]

    # predict classes and cull classifications that are not with high confidence
    for i in indices[0]:
        print(y[i])#This gives the names of the classdir.

def draw_preds(img_path, preds):
    """
    shows the face recognition results visually.
    :param img_path: path to image to be recognized
    :param preds: results of the predict function
    :return:
    """
    source_img = Image.open(img_path).convert("RGBA")
    draw = ImageDraw.Draw(source_img)
    for pred in preds:
        loc = pred[1]
        name = pred[0]
        # (top, right, bottom, left) => (left,top,right,bottom)
        draw.rectangle(((loc[3], loc[0]), (loc[1],loc[2])), outline="red")
        draw.text((loc[3], loc[0] - 30), name, font=ImageFont.truetype('Pillow/Tests/fonts/FreeMono.ttf', 30))
    source_img.show()

if __name__ == "__main__":
    knn_clf = train("knn_examples/train")
    for img_path in listdir("knn_examples/test"):
        preds = predict(join("knn_examples/test", img_path) ,knn_clf=knn_clf)
        print(preds)
        #draw_preds(join("knn_examples/test", img_path), preds)

hanckmail commented 6 years ago

Traceback (most recent call last):
  File "face_recognition_knn.py", line 120, in <module>
    knn_clf = train("knn_examples/train")
  File "face_recognition_knn.py", line 49, in train
    y.append(os.path.splitext(os.path.basename(img_path))[0])
NameError: name 'os' is not defined

when i change y to y.append(class_dir),and is_recognized = [closest_distances[0][i][0] to is_recognized = [closest_distances[0][i] i got the result (thanks God and you) but with error

Traceback (most recent call last):
  File "face_recognition_knn.py", line 124, in <module>
    draw_preds(join("knn_examples/test"), preds)
  File "face_recognition_knn.py", line 109, in draw_preds
    source_img = Image.open(img_path).convert("RGBA")
  File "/home/user/.local/lib/python3.5/site-packages/PIL/Image.py", line 2543, in open
    fp = builtins.open(filename, "rb")
IsADirectoryError: [Errno 21] Is a directory: 'knn_examples/test'

And one more question, as i mentioned before i divided script into 2 parts, 1 to train, 2nd to find result, but now if i do it wont get knn_clf am i wright?

MLDSBigGuy commented 6 years ago

Ah, that os error, i forgot to add import os in starting. I added now. Now copy paste code. Your scripts division doesnot effect anything. I commented out draw_preds function and concentrated only on printing names. When ever you are stuck at some time, try debugging in pycharm editor. It has inline dubugging feature. Also with print() statements

hanckmail commented 6 years ago

when i divide i got this:

Traceback (most recent call last):
  File "face_recognition_knnpredict.py", line 73, in <module>
    preds = predict(join("knn_examples/test", img_path) ,knn_clf=knn_clf)
NameError: name 'knn_clf' is not defined

if i remove knn_clf = train("knn_examples/train") as i did earlier, i get

Traceback (most recent call last):
  File "face_recognition_knnpredict.py", line 71, in <module>
    knn_clf = train("knn_examples/train/model/model.txt")
NameError: name 'train' is not defined

p.s. you did great job, script works well , 1 and i hope last problem remains -how can i make work separately 'train' and 'predict'

MLDSBigGuy commented 6 years ago

what i understand is you want to keep train in x.py file and predict or preds functions in y.py file. Dont keep main in both x.py and y.py files. The main() function will always be in one .py file. Access all other files in to this main .py file through import statement.

Why do you want to seperate these functions to different files ? It is not that big file. You can keep all in one file if errors keep coming

hanckmail commented 6 years ago

I want to train once and then compare test faces with my model. If I use full script it will rewrite my model and will take time to train again. We have a working script, every time i run it, it re-trains photos stored in 'train' folder, and compares it with photo in 'test' folder. but my photos in train folder stay same i dont want to lose time retraining it everytime

MLDSBigGuy commented 6 years ago

u need to seperate the train function to 3 parts:

Till X, y appending to one function as def scan() return X,y
Store X,y in to some txt file as def store() call scan() and dump(X,y)
Retrive X,y details from txt file as def retrieve() return X,y
def Train() with Call retrive() function pass X,y to knn model fit function and remaning lines stay same
def predict() As @FiveMaster suggested, You just need to keep the training results in a txt file at the first time. Like this: knn_clf = train("knn_examples/train", "./model/model.txt") and then preddict like this: preds = predict(join("knn_examples/test", img_path), model_save_path="./model/model.txt") If the training data remains the same, you don't need to training model.

Also as we discussed, kneighbour indices linking part...here, just call retrieve function to get y index. Link indices from knn.kneighbours with stored y value from txt file

In main() function, once u train the model, comment out the train function in main(). means, once train function is called, your model.txt file is saved in the path u gave. So, u just need to call predict() by giving link to this model.txt file . Comment out everthing in main(). Just keep below line preds = predict(join("knn_examples/test", img_path), model_save_path="./model/model.txt") Note: Give full path of model.txt file if file is not identified

hanckmail commented 6 years ago

thank you for everything , but im new in python i dont understand anything without a full example, i decided to use another script and it works well for me:

import face_recognition
import pickle
import numpy as np
from PIL import Image, ImageFont, ImageDraw, ImageEnhance

# Load face encodings
with open('dataset_faces.dat','rb') as f:
    all_face_encodings = pickle.load(f)

# Grab the list of names and the list of encodings
face_names = list(all_face_encodings.keys())
face_encodings = np.array(list(all_face_encodings.values()))

# Try comparing an unknown image
unknown_image = face_recognition.load_image_file("test.png")
unknown_face = face_recognition.face_encodings(unknown_image)
result = face_recognition.api.compare_faces(face_encodings, unknown_face, tolerance=0.5)
names_with_result = list(zip(result, face_names))

with open("file.txt", "w") as file:
    print(names_with_result, file=file)

It takes my early trained file dataset_faces.dat and compares with file test.png , with tolerance 0.5 as a result it creates a file with both true or false results and filename of trained pictures. I think it is enough for me. Can you help me once more? i want that script to print only true results with names of trained pictures. i think its not difficult for you.

MLDSBigGuy commented 6 years ago

result = face_recognition.api.compare_faces(face_encodings, unknown_face, tolerance=0.5)

if result: # Get only True values
  names_with_result = list(zip(result, face_names))
  print(names_with_result)

hanckmail commented 6 years ago

if result: ??? and thats all? or i need to put True somewhere

it returns all values((

ageitgey / face_recognition

face_recognition_knn.py question #319