Closed zhz125 closed 6 years ago
Can you also let me know the exact command line arguments you used to get the results?
I ran your video through OpenFace, and the head poses look reasonably accurate to me, you can clearly see 4 motions in pitch followed by four motions in yaw, as seen in the graph below:
There is a bit of noise in pitch and roll when yaw motions are happening, but this is expected, also even when it appears that a person is only moving their head to the side they are still shaking it slightly by tilting to the sides or raising it up a bit.
If you look at the output video, specifically the bounding box around the head, the tracking looks quite accurate throughout it.
Hi Tadas
We ran openFace by "bin/FeatureExtraction -f video_name" and we got different results. Your plot makes much more sense. The result we have shows a peak in pose_rx and pose_rz around frame 570-600.
Thanks!
What version of OpenFace are you using?
I ran my results on a Windows and Ubuntu machines, but the results should also be the same on a Mac.
Hi I just figured that we didnt update it after March 8th. I rebuilt it and we got the correct results. Thanks!
Hi Tadas,
Some background first:
A) attached .zip file includes video, openface output csv files, and figure containing corresponding head pose time series
B) notational conventions I'll use for head pose:
P = pitch = pose_Rx R = roll = pose_Rz Y = yaw = pose_Ry
'from video' = head pose estimates from FeatureExtraction 'from frames' = head pose estimates from FaceLandmarkImg on single frames parsed from the same video
The issues:
We are getting some strange head pose estimations. 'From video', sometimes when a subject rotates the head in Y, we get erroneous estimates for P and R. In the attached demonstration, the subject does 4 rotations each in P (dn/up/dn/up) and Y (L/R/L/R). During the first Y (to the subject's left, without any notable rotations in P and R), the estimated P and R become very extreme.
Curiously, when instead of 'from Video' we used 'from frames', the issue goes away. But this approaches raises other issues:
1) the P estimates are limited to a smaller range (i.e. approx. -0.3 to 0.4 rad for 'from frames' compared to approx. -0.8 to 1.0 for 'from video');
2) the Y estimates are not only limited to a smaller range but also appear qualitatively different from those generated 'from video'
Attached vidoes_headPoseData.zip