erkil1452 / gaze360

Code for the Gaze360: Physically Unconstrained Gaze Estimation in the Wild Dataset
http://gaze360.csail.mit.edu
Other
225 stars 42 forks source link

how to compute these datas in train.txt #33

Closed jxncyym closed 2 years ago

jxncyym commented 2 years ago

@erkil1452 In the train.txt, I notice the data format such like that:

rec_022/head/000000/000131.jpg 0.453840661720333 0.057788951913994 -0.889207001100381 rec_022/head/000022/000131.jpg 0.187331672118635 0.036473247576072 -0.981619349255347 rec_022/head/000001/000131.jpg 0.376386464341092 0.055499673938626 -0.924798905521367 rec_022/head/000002/000131.jpg 0.286355056945921 0.029490498088630 -0.957669615203481 rec_022/head/000584/000131.jpg 0.430812587090941 0.056813836181665 -0.900651265930567

I want to know how to compute these datas:
0.453840661720333 0.057788951913994 -0.889207001100381 0.187331672118635 0.036473247576072 -0.981619349255347 0.376386464341092 0.055499673938626 -0.924798905521367 0.286355056945921 0.029490498088630 -0.957669615203481 0.430812587090941 0.056813836181665 -0.900651265930567

Is there some formula to compute these data? or some code to get these data?

erkil1452 commented 2 years ago

These are the GT gaze labels in the eye coordinate system. You can find the formulas in our paper (Eq. 1).

jxncyym commented 2 years ago

@erkil1452 I see the formulas, so we can use the code in https://github.com/erkil1452/gaze360/issues/30 to compute the GT gaze labels? there are some other questions I want to ask: 1) If our camera only can get the 2d coordinate, then how to covert the 2d coordinate to 3d coordinate? 2) Whether gaze360 algorithm can only use in the datas in 3d coordinate format? 3) Is there some good algorithms can use in the datas in 2d coordinate format?

erkil1452 commented 2 years ago

Yes, you can use these code snippets to convert a gaze vector expressed in world coordinates into the (network's) eye coordinates (and back).

  1. It is not clear what your intention is. In case you want to reproduce our annotation procedure, you need to first determine 3D location of the eyes and 3D position of the gaze target in some world coordinate system of your choice. There is no general way how to convert 2D coordinate into 3D without additional information about the scene.
  2. Our method is developed for 3D gaze prediction. You can feasibly formulate your own problem definition where the prediction happens in 2D by projecting the 3D labels on a 2D manifold of your liking.
  3. There are many 2D gaze predictors. I can shamelessly refer you to our own iTracker dataset and method: https://gazecapture.csail.mit.edu/