frankier / skelshop

๐Ÿ“บ ๐Ÿ“ฐ ๐Ÿง‘โ€๐Ÿ’ผ Toolkit for skeleton & face analysis of talking heads (e.g. news) videos ๐Ÿง‘โ€๐Ÿ’ผ ๐Ÿ“ฐ ๐Ÿ“บ
https://frankier.github.io/skelshop
MIT License
5 stars 1 forks source link

Person identification #39

Closed frankier closed 3 years ago

frankier commented 3 years ago

We might like to identify one or more people within the videos. There are a few approaches to this. We should like to be identify against a fixed single person or against a small dictionary of people.

There should be at least two possibilities:

  1. Just go through and label each person below a threshold distance. This should be implemented as a baseline since it's simple.
  2. First cluster headshots into groups and then attempt to labels the clusters.

People do do the cluster and then label approach. As in this preprint. https://arxiv.org/pdf/2010.11732 . It's quite apparent that that's probably not SOTA but it is at least simple.

It does intuitively make sense since the varying confounding factors such as pose/lighting in videos mean the different face chips will form a continuum/manifold that clustering should capture (just as long as people's continuums aren't overlapping).

How to choose hyperparameters for clustering?

  1. Could reuse the hyperparameters from https://github.com/yl-1993/learn-to-cluster on the YouTube Faces dataset
    • But wait! That says minPts = 1 so it's equivalent to simple linkage based clustering. Presumably it's faster to just use this if possible.
  2. Could use grid search and the silhouette coefficient like Himani's GSOC: https://github.com/Himani2000/GSOC_2020

Another approach that is an alternative to cluster and then label would be to use a simple fixed-point linkage thing starting from the labelled datapoints (Label propagation? Chinese whispers?) but then we don't get unlabelled clusters. The unlabelled clusters are kind of nice and allow for some degree of manual fixing after the fact.