sambhavjain98 commented 2 years ago

Hi, I have been using your FaceAnalyzer library and it has been super useful in estimating the head pose and blinking parameters, and now I am working on tracking the head position with respect to camera in cm and mapping the corresponding input to my screen object.

I have noticed that you have used `face_3d_reference_positions =np.array([ [0,0,0], # Nose tip

[-80,50,-90], # Left

    #[0,-70,-30],        # Chin
    #[80,50,-90],        # Right
    [-70,50,-70],       # Left left eye
    [70,50,-70],        # Right right eye
    [0,80,-30]        # forehead center
])
`
I have couple of doubts

I am curious on how are you were you able to get these values?
I am trying to make sense of the position coordinates returned by the get_head_posture() , currently I am getting [[ 292.50133269] [-212.73998924] [ 580.38242832]] as x,y,z even when I am in centre of the camera. I was expecting something like [0,0,580] or am I missing any steps ? P.S. I did make few changes to make this library compatible with logitech c922 webcam at 1920x1080 resolution, since your current version doesn't completely support other webcam resolutions properly. I'll make sure to make the changes and raise a pull request soon. :smile:

ParisNeo commented 2 years ago

Hi and sorry for being late. I've actually built this library mainly to play with my little dauther a while ago where I use it for face switch and mapping. Then I found it useful to add extra options to track eyes, blinks, and head motion for robot control for example which kids love.

As for your question :

To estimate face posture, I need a 3d reference face where I can get the 3d coordinate of some fixed vertices then using pnp problem solver I can find the transformation that leads to a 2d projection of the vertices (landmarks).

As I had no access and no time to get the 3d face model that google's mediapipe library actually use, I simply did the measures on my own face. it is a rough measurement done using a ruler! I know, we can do better. LOL.

A better and more precise way, would require finding the 3d reference mesh file, putting it inside a 3D modeling software and getting the position of those vertices.

dlib library examples uses another standard face with specific landmarks. But they are way less than the ones in mediapipe and some of the vertices they use are on the sides of the mouth. I don't like this because it supposes that you never smile or lough. SO I tried other vertices that seemed more fixed to me.

Here is a great tutorial on how to do this using DLIB: https://learnopencv.com/head-pose-estimation-using-opencv-and-dlib/

As for the weird position. The 0,0,0 position is the nose position in a virtual 3D space. Once again, it is not very precise and maybe the fact that we have different faces and maybe nose length is a factor that can lead to poor results. For me, It works (not perfectly as I can see a little offset) but it is fine enough.

Feel free to contribute if this little project interest you. There is nothing very special in this library as it uses state of the art methods that you can find on the internet and is based on mediapipe's library. The interesting thing here is that it allows us to have a toolbox for face landmarks manipulation based on a powerful face landmarks detection library that can be useful for multiple applications out of the box.

If I have time next holydays I'll take it to the next level and add some examples with face morphing and stuff. Unfortunately I can't put assets that are not under open lices. So if you want, after cloning the library, you can use your own assets and try to play with them.

sambhavjain98 commented 2 years ago

Thank you for your reply, Ah, I guessed it might have been your own face values! I have already tried with dlib and I couldn't get accuracy when compared to mediapipe library. I am working on a eye gaze tracking library and stuck in trying to estimate the head's position in world coordinates relative to camera with respect to depth, so I can offset my eye tracker based on the position of the user.

The depth estimation is actually good enough so is Rotation matrix, I just had issues with the x and y coordinates of translation vector. I assume I must now find a 3D face model and capture those coordinates to get accurate results.

Sure, I found the whole mediapipe library quite amazing and will contribute new features ! I am using it as a tool box and it was useful when I was starting out using mediapipe.

Do you by chance have any idea on calculating how much a person moved (in cm ) left or right with just depth estimation and focal length ?

ParisNeo commented 2 years ago

Sorry for being late, I've been very buzy lately. Well, I think if you find the 3D face model used by media pipe, you can enhance the estimation widely.

As for head position/orientation. I am using the opencv SolvePnPRansac solver that starts with 3d position of your reference points, and the 2d positoins obtained by your camera as well as the camera matrix which in my case is a simple standard camera with no distortions and that uses the image dimensions to roughly estimate a focal length (which may be a big problem if you have a wide angle camera and so) to find the object coordinates in a reference frame linked to the center of your camera. (success, face_ori, face_pos, _) = cv2.solvePnPRansac( self.face_3d_reference_positions.astype(np.float), face_2d_positions.astype(np.float), camera_matrix, dist_coeffs, flags=cv2.SOLVEPNP_ITERATIVE) The face_analyzer library allows you to set this matrix and if you know the distorsion coefficients of your camera you can actually have better results along with a better model 3d positions. Here is a link to a tutorial on how you can calibrate your camera and get the distortion coefficients: https://docs.opencv.org/4.x/dc/dbb/tutorial_py_calibration.html

I hope this can help you. If by any chance you find the mediapipe 3d mesh face, I would apreciate if you add it to Face analyzer and do a pull request.

Thansks in advacne.

As for depth information, do you mean using a depth capable camera or are you talking about the depth information given by mediapipe?

I think I have seen a mediapipe example. They use the iris estimation to determine the distance with the camera. https://ai.googleblog.com/2020/08/mediapipe-iris-real-time-iris-tracking.html

For depth camera, you would need to calibrate the RBG part with the depth one. I don't really know for sure, but I think there should be a way, given the focal length of your camera, you can find the 3d position of a point : P3D.x = (x_d - cx_d) * depth(x_d,y_d) / fx_d P3D.y = (y_d - cy_d) * depth(x_d,y_d) / fy_d P3D.z = depth(x_d,y_d) Here where you can find a tutorial http://nicolas.burrus.name/index.php/Research/KinectCalibration

I hope this helps

ParisNeo commented 2 years ago

Hi again,

The new version of FaceAnalyzer use the actual canonical face as reference. It is way more robust than the old version. I have added a blender file of the canonical face. You may try playing with it and select other vertices if you want.

Best regards

sambhavjain98 commented 2 years ago

Hello Paris Neo, I checked out and tried your last reply 11 days ago and I was able to find the issue with tvec not being centered in middle of screen, I resolved it using calibrating camera and using my own camera calibration matrix as well as distortion coefficients.

I was about to use the 3D canonical face obj file provided by the Mediapipe but my project got on hold for a week or may be two before I continue back.

I must thank you for helping me on this, I could bring up my project quickly with your module

Regarding the latest changes, let me try it out on the weekend and test it with different vertices as well.

I have another doubt, can I know which one is better? PNPRANSAC or just the PNP methods by CV2, have you given a thought on this before choosing PNPRANSAC?

ParisNeo commented 2 years ago

Well, the PNPRANSAC uses RANSAC algorithm which is robust to little errors in vertices position.

Namely, RANSAC can operate even if some of your data is corrupted or bad. That's why I've chosen this one over normal PNP. RANSAC as its name sugest will randomly sample the set and try to find the transformation that reduces the error between the transformed vertices and the ones measured taking the set that results in the least errors. With this technique, if some vertices are off, the algorithm will just throw them away and use the others while the simple algorithm will use every vertex which can result in bad estimation. I think it is used in the dlib examples because they use the mouth vertices. So I guess, when you smile, RANSAC will ditch the bad vertices away and use the other ones instead.

In my case, I did chose some pretty fixed vertices so the choice of RANSAC can be questioned in this case. Maybe, I should add the possibility to choose PNPRASAC or just PNP.

If you want, I added in assets folder, a high resolution image with the indices of all vertices. Combining the blender file and the image, you can play with the tool as you like.

Good luck

ParisNeo / FaceAnalyzer

How are the face_3d_reference_positions calculated in Face.py? #1

[-80,50,-90], # Left