davisking / dlib

A toolkit for making real world machine learning and data analysis applications in C++
http://dlib.net
Boost Software License 1.0
13.45k stars 3.37k forks source link

Will "approximate nearest neighbor" algo work for dlib's face recognition? #2674

Closed rajhlinux closed 1 year ago

rajhlinux commented 1 year ago

Is it logical to use approximate nearest neighbor algo for dlib's face recognition?

So I have been reading how face recognition works, it seems that approximate nearest neighbor algo is great for large data sets of image vectors as compared to k-means algo.

So for face clustering and image recognition I was wondering if it would make sense to spend the time to actually have dlib to use approximate nearest neighbor algo searching through a real time expanding image vector data set.

Basically making a homemade DVR which runs 24/7 and stores faces and automatically randomly generates labels for them and when new faces are detected it will be stored to database for future reference.

So dlib will constantly need to search through the face database for every face it detects and searching through a large dataset of millions or billions of images, approximate nearest neighbor algo seems to work better since it is much faster searching for possible matches as compared to k-means algo.

There are some great approximate nearest neighbor frameworks, such as "annoy" and "faiss" in c++ on github.

Before I jump into this rabbit hole, have anyone put thought into the same issue for performance and speed in searching through a large and expanding database of image vectors for face recognition and concluded that a-nn(approximate nearest neighbor) is the way to go?

If Spotify and facebook uses a-nn, it seems that dlib would also benefit from it as well for face recognition.

If a-nn is workable with dlib and makes sense to do so, what are some steps I first need to look into for getting this done?

I have been searching this on google and I only find python implementations of using a-nn for face recognition and it does not help to do it in C++ for me to understand how things are working, the code is too short and high level and have no idea what is going on.

Thanks.

arrufat commented 1 year ago

There is an approximate nearest neighbor already in dlib: http://dlib.net/graph_tools.html#find_approximate_k_nearest_neighbors.

However, I think you can reduce the complexity by not storing all the faces. When a new face is detected, if it's already close to any existing face, you don't need to store it. That will reduce the number of faces to compare by a large amount. Then, it's up to you to decide what “near” means in your case.

rajhlinux commented 1 year ago

Hello, thanks for the reply.

I assumed dlib should have implemented this. However I failed to find it.

That is a great suggestion.

Thanks.

Edit: So I noticed that the link is a header file. Are there any examples of it? A bit clueless on how to use the functions correctly.

arrufat commented 1 year ago

I don't think I can explain it better than the documentation: http://dlib.net/dlib/graph_utils/edge_list_graphs_abstract.h.html#find_approximate_k_nearest_neighbors

rajhlinux commented 1 year ago

Alright, Thanks for your help. The approximate function seems straight forward after reading it over few times. I also have to read how the math of k nearest neighbors works, youtube has many videos explaining about it so that I can understand how approximation can be applied.

dlib-issue-bot commented 1 year ago

Warning: this issue has been inactive for 35 days and will be automatically closed on 2022-12-10 if there is no further activity.

If you are waiting for a response but haven't received one it's possible your question is somehow inappropriate. E.g. it is off topic, you didn't follow the issue submission instructions, or your question is easily answerable by reading the FAQ, dlib's official compilation instructions, dlib's API documentation, or a Google search.

dlib-issue-bot commented 1 year ago

Warning: this issue has been inactive for 42 days and will be automatically closed on 2022-12-10 if there is no further activity.

If you are waiting for a response but haven't received one it's possible your question is somehow inappropriate. E.g. it is off topic, you didn't follow the issue submission instructions, or your question is easily answerable by reading the FAQ, dlib's official compilation instructions, dlib's API documentation, or a Google search.

dlib-issue-bot commented 1 year ago

Notice: this issue has been closed because it has been inactive for 45 days. You may reopen this issue if it has been closed in error.