We might like to identify one or more people within the videos. There are a few approaches to this. We should like to be identify against a fixed single person or against a small dictionary of people.
There should be at least two possibilities:
Just go through and label each person below a threshold distance. This should be implemented as a baseline since it's simple.
First cluster headshots into groups and then attempt to labels the clusters.
People do do the cluster and then label approach. As in this preprint. https://arxiv.org/pdf/2010.11732 . It's quite apparent that that's probably not SOTA but it is at least simple.
It does intuitively make sense since the varying confounding factors such as pose/lighting in videos mean the different face chips will form a continuum/manifold that clustering should capture (just as long as people's continuums aren't overlapping).
Another approach that is an alternative to cluster and then label would be to use a simple fixed-point linkage thing starting from the labelled datapoints (Label propagation? Chinese whispers?) but then we don't get unlabelled clusters. The unlabelled clusters are kind of nice and allow for some degree of manual fixing after the fact.
We might like to identify one or more people within the videos. There are a few approaches to this. We should like to be identify against a fixed single person or against a small dictionary of people.
There should be at least two possibilities:
People do do the cluster and then label approach. As in this preprint. https://arxiv.org/pdf/2010.11732 . It's quite apparent that that's probably not SOTA but it is at least simple.
It does intuitively make sense since the varying confounding factors such as pose/lighting in videos mean the different face chips will form a continuum/manifold that clustering should capture (just as long as people's continuums aren't overlapping).
How to choose hyperparameters for clustering?
Another approach that is an alternative to cluster and then label would be to use a simple fixed-point linkage thing starting from the labelled datapoints (Label propagation? Chinese whispers?) but then we don't get unlabelled clusters. The unlabelled clusters are kind of nice and allow for some degree of manual fixing after the fact.