how about the speed? - Githubissues

We need to consider about two things

Detection: detects faces in the image globally. Results in bounding rectangles or ellipses in general.
Regression: positions the face landmarks within the detected faces (main goal of DEST).

Face detection (from OpenCV) is an order of magnitude slower than regressing the landmarks using DEST. So if regression takes about 5e-3 seconds (5 milliseconds), detection will be around 5e-2 (50 milliseconds).

Unfortunately you need some initial bounding rectangle that roughly defines the area of the face and ideally the method you use to determine that rectangle during training is the same as during testing of the algorithm. In real-time tracking the performance will be thus limited by the detection, not the regression of the face-landmarks.

In the dest_track_video.cpp example I tried to avoid detection in every frame by simulating a detector result from previous alignment results. This allows you to perform detection only every n-th frame, but involves the risk that you loose tracking.

cheind / dest

how about the speed? #10