deepinsight / insightface

State-of-the-art 2D and 3D Face Analysis Project
https://insightface.ai
23.1k stars 5.38k forks source link

What's the Best way to Cluster Faces for Similarity? #1067

Open njho opened 4 years ago

njho commented 4 years ago

I have a set of 1900+ 512-D Facial embeddings/vectors, I'd like to group all similar individuals/faces. There are also an unknown number of distinct faces.

I've employed sklearn.cluster.DBSCAN similar to the suggestion in PyImageSearch Face Clustering with Python. However, it can't cluster effectively, returning 0 clusters. I believe the matrix is too sparse. And believe there are a couple options:

In the process of trying the different methodologies right now, but perhaps there is a well-known method/approach I'm missing?

gerald-ftk commented 2 years ago

Hi @njho, it's been a while, but did you find any effective method of clustering with the features generated by insightface?

njho commented 2 years ago

Hey @gerald-ftk gladly!

There's a question here which should answer. It was implemented and did work with some issues when the person changed the angle of their head. You'll want to build an average embedding of a person's face if possible.

Unfortunately I can't release the code, but I should be able to answer any questions.

If you wouldn't mind, would be great to upvote too!

https://stackoverflow.com/questions/60323102/clustering-512-d-facial-embeddings-vectors

gerald-ftk commented 2 years ago

Thanks for the link!

I will try the methods in your link, I will also try chinese whisper clustering, as there is another issue (#1889 ) that refers to that.

Will report back with my findings.

gerald-ftk commented 2 years ago

So it's been about a week of testing. For those who come across this, it seems like normalizing the data makes alllll the difference. It doesn't really matter which clustering algorithm you choose, if it is normalized (I just use sklearn.preprocessing), then it seems to work fine - even if your dataset is a very difficult (ie, hard to tell apart) dataset.

njho commented 2 years ago

@gerald-ftk Was this normalizing before clustering or after clustering?

gerald-ftk commented 2 years ago

@njho

Normalizing before clustering

I'm still having the occasional strange result with some datasets, but I find that normalizing + HDBSCAN gives the best result.

Jar7 commented 2 years ago

Hey @gerald-ftk gladly!

There's a question here which should answer. It was implemented and did work with some issues when the person changed the angle of their head. You'll want to build an average embedding of a person's face if possible.

Unfortunately I can't release the code, but I should be able to answer any questions.

If you wouldn't mind, would be great to upvote too!

https://stackoverflow.com/questions/60323102/clustering-512-d-facial-embeddings-vectors

Hi @njho Could you please tell more about the solution for clustering faces of various positions? Are there multiple clusters for one identity OR single cluster for one identity after feature normalization? Thank you~

atoaster commented 2 years ago

Hey @gerald-ftk gladly!

There's a question here which should answer. It was implemented and did work with some issues when the person changed the angle of their head. You'll want to build an average embedding of a person's face if possible.

Unfortunately I can't release the code, but I should be able to answer any questions.

If you wouldn't mind, would be great to upvote too!

https://stackoverflow.com/questions/60323102/clustering-512-d-facial-embeddings-vectors

Hi @njho

Could you please tell more about the solution for clustering faces of various positions? Are there multiple clusters for one identity OR single cluster for one identity after feature normalization?

Thank you~

In my scenario, it was one cluster per identity, it actually worked very well. Sometimes there is the odd scenario where 1 cluster will have 2 identities (twins/people wearing masks/obscured faces), and sometimes 2 clusters correspond to the same person, but 99% of the time, one cluster corresponds to one person (when using HDBSCAN)

Jar7 commented 2 years ago

should be able to answer any questions.

If you wouldn't mind, would be great to upvote too!

Thank you for your reply! Have you apply PCA or other techniques to reduce dimension of the features before clustering?

njho commented 2 years ago

Hey @Jar7

I reflect@gerald-ftk in saying that one cluster should be one identity (when you're clustering by similarity).

I'd be interested to hear about reducing dimensionality. The 512D vector's quite large. Requires significant memory when creating similarity matrices.

With the approach we took, we calculated similarity between all embeddings (cosine similarity) and created a similarity matrix.

We then assume if the similarity > threshold that the face embeddings were associated with each other.

From there, you can construct a graph network using networkX or other, to get connected nodes that meet the similarity > threshold requirement. Essentially every embedding in the dataset that has similarity > threshold is assumed to be the same person

njho commented 2 years ago

@njho

Normalizing before clustering

I'm still having the occasional strange result with some datasets, but I find that normalizing + HDBSCAN gives the best result.

@gerald-ftk while we're on the topic, with HDBScan, do you need to provide the number of estimated clusters?

Or will it cluster all unique identities for you by itself?

Ie. If the sample video is of a conference w/ a speaker, there's many miscellaneous face captures that are extraneous to the main speaker. Let's say there's 1 main speaker, and 400 random audience members. Would it work in such a scenario?

gerald-ftk commented 2 years ago

Hi @njho

This method sounds like you are slowly recreating an unsupervised clustering method. In fact, I think these steps are similar to DBSCAN. If you just use DBSCAN you can probably achieve the same or better results, but you will have to figure out what epsilon and min_cluster_size value works for you.

@njho Normalizing before clustering I'm still having the occasional strange result with some datasets, but I find that normalizing + HDBSCAN gives the best result.

@gerald-ftk while we're on the topic, with HDBScan, do you need to provide the number of estimated clusters?

Or will it cluster all unique identities for you by itself?

Ie. If the sample video is of a conference w/ a speaker, there's many miscellaneous face captures that are extraneous to the main speaker. Let's say there's 1 main speaker, and 400 random audience members. Would it work in such a scenario?

You don't need to provide an estimated cluster size for HDBSCAN. Just like DBSCAN, it is entirely unsupervised! HDBSCAN will ask for min_cluster_size though.

In your conference example, it would depend on the amount of feature vectors of each person. Say your conference speaker had 1000 feature vectors, and everyone else had 5-10. Even with a low min_cluster_size, there's a chance you would only be able to cluster the speaker. With DBSCAN, I found that the min_cluster_size is more true to its name. eg if you tell DBSCAN to cluster with a min_cluster_size of 5, you can be certain that anyone with more than 5 clusters will successfully be clustered together.

So, while HDBSCAN performs far better clustering in my experiments, DBSCAN seems to handle tiny clusters a bit better. The other downside of HDBSCAN is that it is not as easily parallelisable, which can sometimes cause it to take a long time. For exmaple in my case, clustering 80,000 512D feature vectors will take around 3 hours.

gerald-ftk commented 2 years ago

I have also experimented a little with umap dimensionality reduction. I found UMAP outperforms PCA and TSNE, but I couldn't tell you why or how.

Generally, we got similar results when clustering the 512D vectors compared to when clustering lower dimensional vectors. The big benefit was a massive increase in speed, but for us it wasn't really worth it, since we focus on accuracy over speed.

Just remember that both of these clustering methods (DBSCAN/HDBSCAN) seem to only work well on normalized data. We just use scikits normalizer and that seemed to do the trick.

njho commented 1 year ago

Awesome! Love talking about this stuff 🤓 Thanks @gerald-ftk

Jar7 commented 1 year ago

Hey @Jar7

I reflect@gerald-ftk in saying that one cluster should be one identity (when you're clustering by similarity).

I'd be interested to hear about reducing dimensionality. The 512D vector's quite large. Requires significant memory when creating similarity matrices.

With the approach we took, we calculated similarity between all embeddings (cosine similarity) and created a similarity matrix.

We then assume if the similarity > threshold that the face embeddings were associated with each other.

From there, you can construct a graph network using networkX or other, to get connected nodes that meet the similarity > threshold requirement. Essentially every embedding in the dataset that has similarity > threshold is assumed to be the same person

@njho Thank you so much for the info!