@erkil1452 hello, I notice there are a new gaze dataset named ETH-XGaze(https://ait.ethz.ch/projects/2020/ETH-XGaze/), I still have some questions about how to get the gaze label. 1) how to get the target 3d coordinate. you told me to measure the target 3d coordinate use tapeline. there I have a question? what measurement units I should use: mm,cm or m? the second question is how to choose the origin of the wold coordinate? when the origin change, the target world coordiante also changed。then when we convert the target world coordiante to the cameras coordiante, the target 3d coordiante will different。2)how to get the eye 3d coordiante。first, I want to use dlib to get the face landmarks, the use the 3d face landmarks models to convert, but this method can only get rotation matirx and translation matrix throught pnp, and I don't know how to get the eye 3d coordiante through the rotation matrix and translation matrix. can you help me?
Yes, ETH-XGaze is a very nice dataset with fewer subjects, indoor setting but superior image quality. The authors is my colleague now.
You can use any of those units. The gaze vector gets normalized at the end so it makes no difference.
The origin of the gaze vector should match the origin we used when annotated dataset. It is approximately the mid-point between both eyes. You do not have to be super-precise as our vision-based annotation is not precise either. A small offset of the origin should make a negligible difference for the direction vector if the gaze target is over a meter away.
The origin of the coordinate system is irrelevant as long it is the same for measuring both the gaze origin and the gaze target. It will make no difference after you subtract them from each other to get the direction vector. You can choose any fixed point in the scene. E.g. a point on the floor below the center of the screen would make sense.
I am not sure how many points you want to collect. If it is just for test, you can use a tape measure to estimate the distance of the person's head from the screen and height above the floor etc. If you need to automate it, then your dlib approach should work. What you miss is the distance. You need to measure the distance from the camera (= depth). You can try using our trick from the paper or you can use kinect or stereocamera to measure it directly.
Also, no worry, but if the matter is not reporting issues with the repo, it would be easier to communicate directly by e-mail.
@erkil1452 hello, I notice there are a new gaze dataset named ETH-XGaze(https://ait.ethz.ch/projects/2020/ETH-XGaze/), I still have some questions about how to get the gaze label. 1) how to get the target 3d coordinate. you told me to measure the target 3d coordinate use tapeline. there I have a question? what measurement units I should use: mm,cm or m? the second question is how to choose the origin of the wold coordinate? when the origin change, the target world coordiante also changed。then when we convert the target world coordiante to the cameras coordiante, the target 3d coordiante will different。2)how to get the eye 3d coordiante。first, I want to use dlib to get the face landmarks, the use the 3d face landmarks models to convert, but this method can only get rotation matirx and translation matrix throught pnp, and I don't know how to get the eye 3d coordiante through the rotation matrix and translation matrix. can you help me?