IntelRealSense / librealsense

Intel® RealSense™ SDK
https://www.intelrealsense.com/
Apache License 2.0
7.6k stars 4.83k forks source link

Face Scanning Features #3716

Closed grigala closed 5 years ago

grigala commented 5 years ago

Required Info
Camera Model D415
Firmware Version 05.11.01.100
Operating System & Version Win 10 / Linux (Ubuntu 18)
Kernel Version (Linux Only) 4.15.0
Platform PC
SDK Version 2.20.0
Language Python
Segment Capturing Face Scans

Computer Vision Middleware(3D scanning, face scanning, landmarks, etc.)

As far as I understand from the previous issues(https://github.com/IntelRealSense/librealsense/issues/906, https://github.com/IntelRealSense/librealsense/issues/3493), the new RealSense SDK doesn't support any kind of computer vision middleware capabilities like previous(now deprecated) SDK used to.

I'm currently working on a project which requires face images(RGB, Depth/PointCloud) to be extracted from D415 camera module. However, I couldn't find any tool which will allow me to capture specific regions(in our case face) instead of entire scene. The previous SDK used to have a powerful API to handle this kind of tasks, such as Face, Landmark, Pose, Expression detection tools, as well as entire module dedicated to 3D scanning procedure.

I understand that there's no such features in this SDK and not planned to be added in the near future, but maybe there's something I'm missing. Is there any alternative open-source solutions to target this problem using D415 camera? Maybe add-on to the RealSense SDK or something like that?

I found the Dot3D Scan software integrated into RealSense SDK but it's a commercial product and designed to tackle much bigger tasks than Face Scanning.

Besides that the only solution I can think of at this point is to process the entire captured scene and somehow crop the face region - this sounds relatively easy and straightforward for color images, but what about depth or point cloud versions of the same image?

Any recommendations would be appreciated!

MartyG-RealSense commented 5 years ago

The easiest solution would be to use commercial software called Nuitrack SDK. It primarily does skeletal tracking but added excellent face landmark tracking in late 2018. Nuitrack costs around $30 a year and has a free trial version.

https://community.nuitrack.com/t/face-tracking-added-to-nuitrack-sdk/734

grigala commented 5 years ago

Thanks for the reply! I think Nuitrack is designed for a different domain of applications such as object tracking and not necessarily for scanning.

I found some alternative ways to create facial landmarks through various C++/Python libraries like OpenCV and dlib. Don't know yet how straightforward would it be to integrate them and talk to RealSense SDK.

I'll post updates here and maybe eventually send a PR, please don't close the thread.

grigala commented 5 years ago

Incremental update:

So as many of you already know detecting a facial landmarks on a 2D image is very easy using trained model and helper libraries like dlib.

What I'm trying to do is to detect landmarks on color image (say [200, 200] is a landmark pixel) and then try to obtain point coordinates based on pixel and depth intrinsic parameters https://github.com/IntelRealSense/librealsense/issues/1904:

# Obtaining the depth value for a given pixel
z = depth_frame.get_distance(200, 200) # [200, 200] is a landmark point
rs.rs2_deproject_pixel_to_point(depth_intrinsics, [200, 200], z))

in that way we are finding the corresponding points in 3D space. Just for the sake of this example say it returns a point 0.03837661072611809 -0.07978643476963043 0.44700002670288086. Now the problem is that I cannot find this point in the point cloud mesh generated from the same frame points by calling export_to_ply() method.

Maybe somebody who has more experience with this kind of mapping can point out, how accurate is this kind of mapping, am I missing something? Is there any other more straightforward way to find correspondence between pixel and point?

Follow up questions: what happens if we deproject every single pixel on the image to 3D coordinate system? Can we produce point cloud in that way?

dorodnic commented 5 years ago

Hi @grigala I'm glad you are on the right path. Yes, dlib and others offer excellent services for facial landmarks. I think that you are missing the step of rgb and depth alignment (pixel coordinates in the RGB image do not correspond directly to pixels in the depth image). Please take a look at rs-align example, rs-dnn, rs-measure and other examples that combine depth and color info.

RealSenseCustomerSupport commented 5 years ago

@grigala Any other questions regarding to Dorodnic's reply? Looking forward to your update. Thanks!

grigala commented 5 years ago

Hi @dorodnic, @RealSenseCustomerSupport ,

Sorry took me a while to see your comment.

First of all thank you for reply.

Regarding the issue, yes I'm aligning color and depth frames first and using them throughout my project:

frames = pipeline.wait_for_frames()
aligned_frames = align.process(frames)

aligned_depth_frame = aligned_frames.get_depth_frame()
aligned_color_frame = aligned_frames.get_color_frame()

but still I cannot find the deprojected points in the point cloud mesh.

Here, let me show you a small demo, this is how point cloud mesh and deprojected landmark pixel(red dots) alignment looks like(points are multiplied by 1000 to match project requirements): img Not only points are way far in front of the ones in the point cloud but the face is upside down.

The only thing that matters for my project at this point is to find the location(even with some uncertainty) of facial landmark points in the point cloud.

Of course I can apply some transformations to the landmark points but it would be much harder to get it right where they should be... Meaning for example since landmarks are in front of the face I could manipulate with Z-direction to move it closer, I'm sure there will be some way to flip it too, but is this the only/right way to do it? I feel like I'm missing something here... Why is face(formed by landmarks) upside down? Edit: My bad, I was actually multiplying y and z coordinate of points by -1 like: https://github.com/IntelRealSense/librealsense/blob/d6f6be84b46190c8c84c95f6ac279d239320fcda/src/archive.cpp#L68 So that's the reason why actually mesh was upside-down not landmarks, now they are in the same orientation, ignore this specific question about orientation.

P.S. Once again this is the path those landmarks are obtained:

  1. Detecting facial landmarks on a color image using dlib -> getting (w, h) pixel location for each landmark.
  2. Passing this landmark pixel location to first calculate depth z = aligned_depth_frame.get_distance(w, h)
  3. Then using this depth(z) and pixel information([w, h]) alongside with intrinsic camera parameters to get the point in 3D space: rs.rs2_deproject_pixel_to_point(depth_intrinsics, [w, h], z))

Thank you again for follow up!

rajeshjoshi8792 commented 2 years ago

@grigala how you used 3D space data after using rs.rs2_deproject_pixel_to_point(depth_intrinsics, [w, h], z))?

grigala commented 2 years ago

@rajeshjoshi8792 don't know what do you mean exactly, but here's the method I've used it in:

https://github.com/grigala/3DMMDepthFitting/blob/075dd88d662a3c52262a23034646619cf9bb0a64/scala/src/main/resources/python/pipeline_utils.py#L147

As far as I remember, in the end I figured out how to solve this problem, but unfortunately it's been a while, so I don't exactly remember what was it all about.

rajeshjoshi8792 commented 2 years ago

@grigala i want to use depth data for detecting facial landmark. i know how to use RGB image to detect facial landmark. but i want to use depth data for detecting facial landmark.

grigala commented 2 years ago

@rajeshjoshi8792

The depth means z axis information in the 3D space, so I'm not sure what is meant by

I want to use depth data for detecting facial landmarks

What you are talking about is probably to detect 3D landmarks in the point cloud and this is exactly what I was doing.

Obviously, you cannot directly run landmark detection over the point cloud(at least as far as I know). You have to first detect pixel landmarks in an RGB image and then map the pixels to 3D points using the camera's intrinsic parameters. Again this is exactly what I was doing in the referenced project above, and the method that I was using referring to while describing this issue:

rs.rs2_deproject_pixel_to_point(depth_intrinsics, [w, h], z))

would do exactly that, it will take intrinsic parameters depth_intrinsics, RGB image pixel [w,h] and its distance from the camera z aka the depth calculated by

z = depth_frame.get_distance(w, h)

and in the end give you a 3D point that is mapping from it's corresponding pixel value from the RGB image.

rajeshjoshi8792 commented 2 years ago

@grigala thank you for quick reply. I already follow above codes and get 3d points. But i dont know hot to use it further. I want visualize that 3d points(facial landmark).

rajeshjoshi8792 commented 2 years ago

@grigala can you help me with this? i can able to get 3d points. but how i show that on depth data?

grigala commented 2 years ago

@rajeshjoshi8792 you need some sort of point cloud polygon visualizer that will allow you to pick certain points and color-grade them or something like that. If your point cloud has a texture, you can try to tinker with it too, like - make the detected point a certain color that will stand out and allow you to recognize.

I don't remember whether the default toolkit has a tool like that or not. But there are certainly quite a few alternatives out there, depending on your need.

In case you are interested, I was visualizing everything using Scalismo-UI tool as my work was focused on the Scalismo framework pipeline.