esimov / pigo

Fast face detection, pupil/eyes localization and facial landmark points detection library in pure Go.
MIT License
4.39k stars 310 forks source link

Is there a way to create new landmark point definitions? #55

Closed jjzazuet closed 2 years ago

jjzazuet commented 2 years ago

Hi. I'm trying to implement webcam based facial motion capture with with library.

The current set of face and retina landmarks is quite good, but I'm wondering if it's possible to extend the landmark regions that the PICO process can identify.

Specifically:

Option 1: adding new face landmarks

I'm assuming that new facial landmark locations would require defining new cascade files. Did the paper authors mentioned any recommendations on collecting and tagging example training data, or the training process itself?

facial landmarks

Option 2: adding simple paint based landmarks

Perhaps a simpler option would be to somehow define cascades for recognizing simple shapes like dot marks, or cross marks placed in an actor's face where the missing facial landmark points are needed.

paint marks

Let me know if I missed anything.

Thanks!

Resources

AutoDesk SoftImage - The Motion Capture Process

esimov commented 2 years ago

In order to define new landmark points these needs to be trained by using some kind of neural network. I obtained the cascade files directly from one of the authors of the paper. The project is covering only the computer vision part and not the convolutional neural network part. In the papers cited on the Pigo readme page also there is no reference about the process used for training the cascade files and I'm not sure if this information would be available. I contacted Nenad Markus (one of the authors) a few years back and had a conversation with him about a few aspects of the project, but never talked about the neural network training. Eventually I might contact him again for asking some help on the neural network part (because I admit that it would be great to extend the facial landmark zones with more points of interests).

However my future interest is to develop this project into a computer vision library by supporting features and objects detection as it is mentioned in the Readme page. This is quite a big task but I consider doable. For more reference there is separate ticket covering this requirement https://github.com/esimov/pigo/issues/38.

jjzazuet commented 2 years ago

@esimov got it, makes sense. Should I close this ticket and continue the conversation in #38 then? Thanks!

esimov commented 2 years ago

You should close it. I might reopen it case it will be more progress regarding new feature points.

jjzazuet commented 2 years ago

Got it. Thanks!

jjzazuet commented 2 years ago

Hi @esimov . I kept pondering upon this issue during these last days, and I'd like to run this idea by you.

It looks like the OpenCV project already has some level of tooling available in order to generate training data used to create new Haar cascades:

https://docs.opencv.org/3.4/dc/d88/tutorial_traincascade.html https://amin-ahmadi.com/cascade-trainer-gui/

Here are some examples of the training process outputs:

https://github.com/opencv/opencv/tree/master/data/haarcascades

So the question is: is there a way to convert an OpenCV XML cascade file into the in-memory format used by Pigo? In other words, is it possible to extend the input cascade file format to read from OpenCV XML files instead of the legacy binary format defined by Nenad Markus?

If this is possible, I think it would be useful to convert XML cascades into the face cascade format since it can be applied to the whole input image.

I realized this when I started creating a Java port for Pigo:

https://github.com/vaccovecrana/kimaris

Let me know what you think.

Thanks!

esimov commented 2 years ago

@jjzazuet I'm not very convinced if this approach should work from various reasons: first because the xml structure is quite different, but that's not the main bottleneck. The main culprit from my perspective is that the algorithm itself has been adapted to the binary tree cascade structure. I found similarities between the OpenCV based XML cascade tree structure and the in-memory, binary based one used by Pigo, but I also found differences. Overall they resembles in many aspects (the leaf nodes are present, there is also a threshold and weakness counter), but I found that the tree depth and the tree codes are missing form the xml cascade files. These are key parts of the algorithm. So in order to adapt the algorithm to the xml based cascade files means that the whole code needs to be rewritten.

I will try to contact Nenad.

Btw: thumbs up for your Java port!