Tobias-Fischer / rt_gene

RT-GENE: Real-Time Eye Gaze and Blink Estimation in Natural Environments
http://www.imperial.ac.uk/personal-robotics
Other
368 stars 68 forks source link

Questions about output data #81

Closed kchen003 closed 4 years ago

kchen003 commented 4 years ago

I currently have two problems.

Problem 1: I'm running the estimate_gaze_standalone file and I'm printing the gaze and headpose data to a txt file. But I find that both gaze and headpose parameters are an array like [0.14779022336006165, 0.401142060756778335]. I would like to know what the two numbers in this array represent, is it the direction of the gaze and headpose respectively? Shouldn't it be a 3D data if it's a direction? (I mean there should be three numbers in each array)

Problem 2: The method I'm currently using is to pump frame the video and then store the jpg file in the samples_gaze folder. But this will display that image after processing each image, how can I automatically process all of the deposited images all over again without me having to manually close the display of each photo?

ahmed-alhindawi commented 4 years ago

Hi, For problem 1: you can have a read of the paper to understand the output: (https://openaccess.thecvf.com/content_ECCV_2018/html/Tobias_Fischer_RT-GENE_Real-Time_Eye_ECCV_2018_paper.html)

Alternatively, look at estimate_gaze function in the estimate_gaze_standalone.py file and follow the logic there.

For problem 2: look at the help function of estimate_gaze_standalone.py; specifically the ones related to visualising the output.

Hope that helps.

Tobias-Fischer commented 4 years ago

1) For head pose we first align the head to remove roll, so the output is just yaw and pitch. For gaze, it's only up/down and left/right. So they are 2D arrays.

2) add --no-vis-headpose and --no-vis-gaze as arguments and it won't display the windows

kchen003 commented 4 years ago

Thank you very much, my previous problem has been solved. But I now find that when I call standalone, using the samples folder, I want to know in what order the images in the folder are called and get data? I've noticed that standalone doesn't seem to take the images from samples sequentially, and I'm wondering if I put in 10 images in chronological order, the standalone project doesn't seem to run sequentially.

Translated with www.DeepL.com/Translator (free version)

Tobias-Fischer commented 4 years ago

I just committed https://github.com/Tobias-Fischer/rt_gene/commit/cd30dcf84f605282d8f302d20429aa09114444e9 and it now iterates through the folder in order. Hope that helps.

lindayuanyuan commented 3 years ago
  1. For head pose we first align the head to remove roll, so the output is just yaw and pitch. For gaze, it's only up/down and left/right. So they are 2D arrays.
  2. add --no-vis-headpose and --no-vis-gaze as arguments and it won't display the windows

Hi, Thanks for the answer to the output. I use the sample gaze image to understand the output arrays. Here is the estimate output for the gaze left and gaze up. are the numbers yaw and pitch angle? It seems they are between -1 and 1. Thanks.

gaze_left, [0.2422601864213032, -0.3650999076154193], [-0.39496946334838867, -0.17536866664886475] gaze_up, [0.20357558692153432, -0.589249770677153], [-0.11519338190555573, 0.16353148221969604]

Tobias-Fischer commented 3 years ago

Yes - they are angles (in radians). See https://zenodo.org/record/2529036 for more information.

lindayuanyuan commented 3 years ago

Yes - they are angles (in radians). See https://zenodo.org/record/2529036 for more information.

Got it. Thanks for the quick response.

lindayuanyuan commented 3 years ago

Yes - they are angles (in radians). See https://zenodo.org/record/2529036 for more information.

One quick follow up. If I understand the paper correctly, the gaze estimation in RE-GENE is kind of adding up the head pose angle with respect to the camera and the gaze angle in the frame of eye-tracking, hence the final gaze estimate output is transformed into the frame of the RGB-D camera, right? For example, given a picture, the gaze estimate output is with respect to the viewer who is seeing the picture, except switching left and right. Thanks. If not, how should I combine head pose angle and gaze angle to the absolute gaze angle toward the viewer, simply add them up? Any thoughts are apperciated, thanks.

Tobias-Fischer commented 3 years ago

There are two networks - one purely estimating the head pose, and the other purely estimating the eye gaze. See e.g. https://github.com/Tobias-Fischer/rt_gene/blob/master/rt_gene/scripts/estimate_gaze.py on how to fuse head pose angles and gaze angles.

lindayuanyuan commented 3 years ago

Thanks for such a quick response. I see the two networks part. For the pure gaze estimation part, the ground truth annotated by the eye tracking is the gaze angle with respect to the head facing direction, right? however, I don't get the part on how to fuse head pose angles and gaze angles. Can you point me to the exact code or descriptions in the writing? Thanks.

Tobias-Fischer commented 3 years ago

Yes, w.r.t. head facing direction. In the ROS code (above), the eye gaze angle is published w.r.t. the head coordinate frame.

lindayuanyuan commented 3 years ago

Yes, w.r.t. head facing direction. In the ROS code (above), the eye gaze angle is published w.r.t. the head coordinate frame.

Thanks for the clarification. I understand that the published gaze is in the head coordinate frame now. I read your paper again it says that "Most importantly, we can map the gaze vector g to the frame of the RGB-D camera using T(E→C). I am wondering how I can transform the published gaze estimation in head coordinates g_h to camera cordinates g_c. I also followed the paper Deng and Zhu [10] on gaze transform layer that uses formula 2: g_c = Rg_h. Does that mean I can use the head vector gaze vector (from the estimates output) to get the transformed gaze estimation in the camera coordinate frame? sorry for the long question. I appreciate your reply. thanks.

Tobias-Fischer commented 3 years ago

Hello, this is all done in the ROS code via the tf library. As this is a general question about coordinate transforms and not about RT-GENE, I recommend opening a question on e.g. stackoverflow.

lindayuanyuan commented 3 years ago

thanks for the quick response as always. I will learn the coordinate transform from other sources. Just wandering will RT-GENE publish gaze estimation in image/camera frame? as it has good application and is easier to do the transformation in your work. Thanks.

Tobias-Fischer commented 3 years ago

Yes, it does already. Please read some tutorials on the tf transform in ROS: http://wiki.ros.org/tf/Tutorials

lindayuanyuan commented 3 years ago

In the estimate_gaze_standalone.py output? I am using existing images. thanks for the resources. I will study it carefully.