Open liuquande opened 2 years ago
have the same problem here
The problem is that model has 96x96 resolution. So it downscales face square and than upscales to fit your source video. There's no solution. You can only train hi-res model )
@liuquande I train the model use AVSpeech and meet the same problem. Do you use the pretrained model?
The reason this is happening is because the detected face image is resized to 96x96 before being inputted to the network, and the outputted lip-synced face, which is also 96x96, is being resized back to the original dimensions of the face. If the original face is larger than 96x96, it might cause those blurry edges due to up-sampling the lip-sync (probably by interpolation). The bigger the difference in dimensions between the original face and the 96x96 input dimensions, the more conspicuous those edges will be. I see two straightforward ways to deal with it:
I implemented (2) and it completely eliminated the square while keeping the lip-synced face intact.
The reason this is happening is because the detected face image is resized to 96x96 before being inputted to the network, and the outputted lip-synced face, which is also 96x96, is being resized back to the original dimensions of the face. If the original face is larger than 96x96, it might cause those blurry edges due to up-sampling the lip-sync (probably by interpolation). The bigger the difference in dimensions between the original face and the 96x96 input dimensions, the more conspicuous those edges will be. I see two straightforward ways to deal with it:
- Downsample input frames using the --resize_factor argument. While this will reduce video resolution, it will reduce face dimensions and mitigate the square effect.
- (requires modifying code) Calculate a face mask for each frame so you'll know to paste only the outputted lip-synced face and not its surrounding. There are many libraries that can do that in different ways such as MediaPipe and face-parsing.
I implemented (2) and it completely eliminated the square while keeping the lip-synced face intact.
Could you please share the modified code? I'm a beginner, and this issue has been bothering me for a long time. Thank you very much!
It's a bit problematic for me to send the code. However, I can send the modifications. Take them as rough guidelines rather than exact. Read about how to use MediaPipe.
Install MediaPipe:
pip install mediapipe==0.10.0
Download face landmarker model and put it in the "weights" folder:
wget -O weights/face_landmarker_v2_with_blendshapes.task -q https://storage.googleapis.com/mediapipe-models/face_landmarker/face_landmarker/float16/1/face_landmarker.task
Add MediaPipe imports:
from mediapipe.tasks import python
from mediapipe.tasks.python import vision
import mediapipe as mp
Add face mask args:
parser.add_argument('--face_landmarks_detector_path', default='weights/face_landmarker_v2_with_blendshapes.task',
type=str, help='Path to face landmarks detector')
parser.add_argument('--with_face_mask', action='store_true',
help='Blend output into original frame using a face mask rather than directly blending the face box. This prevents a lower resolution square artifact around lower face')
In main() in the loop starting with "for p, f, c in ..." Modify The line f[y1:y2, x1:x2] = p
to:
mask = face_mask_from_image(p, face_landmarks_detector)
f[y1:y2, x1:x2] = f[y1:y2, x1:x2] * (1 - mask[..., None]) + p * mask[..., None]
else:
f[y1:y2, x1:x2] = p
Add face_mask_from_image function:
def face_mask_from_image(image, face_landmarks_detector):
"""
Calculate face mask from image. This is done by
Args:
image: numpy array of an image
face_landmarks_detector: mediapipa face landmarks detector
Returns:
A uint8 numpy array with the same height and width of the input image, containing a binary mask of the face in the image
"""
# initialize mask
mask = np.zeros((image.shape[0], image.shape[1]), dtype=np.uint8)
# detect face landmarks
mp_image = mp.Image(image_format=mp.ImageFormat.SRGB, data=image)
detection = face_landmarks_detector.detect(mp_image)
if len(detection.face_landmarks) == 0:
# no face detected - set mask to all of the image
mask[:] = 1
return mask
# extract landmarks coordinates
face_coords = np.array([[lm.x * image.shape[1], lm.y * image.shape[0]] for lm in detection.face_landmarks[0]])
# calculate convex hull from face coordinates
convex_hull = cv2.convexHull(face_coords.astype(np.float32))
# apply convex hull to mask
return cv2.fillPoly(mask, pts=[convex_hull.squeeze().astype(np.int32)], color=1)
It's a bit problematic for me to send the code. However, I can send the modifications. Take them as rough guidelines rather than exact. Read about how to use MediaPipe.
Install MediaPipe:
pip install mediapipe==0.10.0
Download face landmarker model and put it in the "weights" folder:
wget -O weights/face_landmarker_v2_with_blendshapes.task -q https://storage.googleapis.com/mediapipe-models/face_landmarker/face_landmarker/float16/1/face_landmarker.task
Add MediaPipe imports:
from mediapipe.tasks import python from mediapipe.tasks.python import vision import mediapipe as mp
Add face mask args:
parser.add_argument('--face_landmarks_detector_path', default='weights/face_landmarker_v2_with_blendshapes.task', type=str, help='Path to face landmarks detector') parser.add_argument('--with_face_mask', action='store_true', help='Blend output into original frame using a face mask rather than directly blending the face box. This prevents a lower resolution square artifact around lower face')
In main() in the loop starting with "for p, f, c in ..." Modify The line
f[y1:y2, x1:x2] = p
to:mask = face_mask_from_image(p, face_landmarks_detector) f[y1:y2, x1:x2] = f[y1:y2, x1:x2] * (1 - mask[..., None]) + p * mask[..., None] else: f[y1:y2, x1:x2] = p
Add face_mask_from_image function:
def face_mask_from_image(image, face_landmarks_detector): """ Calculate face mask from image. This is done by Args: image: numpy array of an image face_landmarks_detector: mediapipa face landmarks detector Returns: A uint8 numpy array with the same height and width of the input image, containing a binary mask of the face in the image """ # initialize mask mask = np.zeros((image.shape[0], image.shape[1]), dtype=np.uint8) # detect face landmarks mp_image = mp.Image(image_format=mp.ImageFormat.SRGB, data=image) detection = face_landmarks_detector.detect(mp_image) if len(detection.face_landmarks) == 0: # no face detected - set mask to all of the image mask[:] = 1 return mask # extract landmarks coordinates face_coords = np.array([[lm.x * image.shape[1], lm.y * image.shape[0]] for lm in detection.face_landmarks[0]]) # calculate convex hull from face coordinates convex_hull = cv2.convexHull(face_coords.astype(np.float32)) # apply convex hull to mask return cv2.fillPoly(mask, pts=[convex_hull.squeeze().astype(np.int32)], color=1)
Thank you very much for your help!
It's a bit problematic for me to send the code. However, I can send the modifications. Take them as rough guidelines rather than exact. Read about how to use MediaPipe.
Install MediaPipe:
pip install mediapipe==0.10.0
Download face landmarker model and put it in the "weights" folder:
wget -O weights/face_landmarker_v2_with_blendshapes.task -q https://storage.googleapis.com/mediapipe-models/face_landmarker/face_landmarker/float16/1/face_landmarker.task
Add MediaPipe imports:
from mediapipe.tasks import python from mediapipe.tasks.python import vision import mediapipe as mp
Add face mask args:
parser.add_argument('--face_landmarks_detector_path', default='weights/face_landmarker_v2_with_blendshapes.task', type=str, help='Path to face landmarks detector') parser.add_argument('--with_face_mask', action='store_true', help='Blend output into original frame using a face mask rather than directly blending the face box. This prevents a lower resolution square artifact around lower face')
In main() in the loop starting with "for p, f, c in ..." Modify The line
f[y1:y2, x1:x2] = p
to:mask = face_mask_from_image(p, face_landmarks_detector) f[y1:y2, x1:x2] = f[y1:y2, x1:x2] * (1 - mask[..., None]) + p * mask[..., None] else: f[y1:y2, x1:x2] = p
Add face_mask_from_image function:
def face_mask_from_image(image, face_landmarks_detector): """ Calculate face mask from image. This is done by Args: image: numpy array of an image face_landmarks_detector: mediapipa face landmarks detector Returns: A uint8 numpy array with the same height and width of the input image, containing a binary mask of the face in the image """ # initialize mask mask = np.zeros((image.shape[0], image.shape[1]), dtype=np.uint8) # detect face landmarks mp_image = mp.Image(image_format=mp.ImageFormat.SRGB, data=image) detection = face_landmarks_detector.detect(mp_image) if len(detection.face_landmarks) == 0: # no face detected - set mask to all of the image mask[:] = 1 return mask # extract landmarks coordinates face_coords = np.array([[lm.x * image.shape[1], lm.y * image.shape[0]] for lm in detection.face_landmarks[0]]) # calculate convex hull from face coordinates convex_hull = cv2.convexHull(face_coords.astype(np.float32)) # apply convex hull to mask return cv2.fillPoly(mask, pts=[convex_hull.squeeze().astype(np.int32)], color=1)
I try these code in inferrence.py,but return with error: face_landmarks_detector has not been defined,can you show how to create this object with mediapipe?
我发送代码有点问题。不过,我可以发送修改内容。将它们作为粗略的指导方针而不是精确的指导方针。了解如何使用 MediaPipe。
安装 MediaPipe:
pip install mediapipe==0.10.0
下载人脸地标模型并将其放入“weights”文件夹中:
wget -O weights/face_landmarker_v2_with_blendshapes.task -q https://storage.googleapis.com/mediapipe-models/face_landmarker/face_landmarker/float16/1/face_landmarker.task
添加 MediaPipe 导入:
from mediapipe.tasks import python from mediapipe.tasks.python import vision import mediapipe as mp
添加面罩参数:
parser.add_argument('--face_landmarks_detector_path', default='weights/face_landmarker_v2_with_blendshapes.task', type=str, help='Path to face landmarks detector') parser.add_argument('--with_face_mask', action='store_true', help='Blend output into original frame using a face mask rather than directly blending the face box. This prevents a lower resolution square artifact around lower face')
在 main() 中以“for p, f, c in ...”开头的循环中将该行修改为
f[y1:y2, x1:x2] = p
:mask = face_mask_from_image(p, face_landmarks_detector) f[y1:y2, x1:x2] = f[y1:y2, x1:x2] * (1 - mask[..., None]) + p * mask[..., None] else: f[y1:y2, x1:x2] = p
添加face_mask_from_image函数:
def face_mask_from_image(image, face_landmarks_detector): """ Calculate face mask from image. This is done by Args: image: numpy array of an image face_landmarks_detector: mediapipa face landmarks detector Returns: A uint8 numpy array with the same height and width of the input image, containing a binary mask of the face in the image """ # initialize mask mask = np.zeros((image.shape[0], image.shape[1]), dtype=np.uint8) # detect face landmarks mp_image = mp.Image(image_format=mp.ImageFormat.SRGB, data=image) detection = face_landmarks_detector.detect(mp_image) if len(detection.face_landmarks) == 0: # no face detected - set mask to all of the image mask[:] = 1 return mask # extract landmarks coordinates face_coords = np.array([[lm.x * image.shape[1], lm.y * image.shape[0]] for lm in detection.face_landmarks[0]]) # calculate convex hull from face coordinates convex_hull = cv2.convexHull(face_coords.astype(np.float32)) # apply convex hull to mask return cv2.fillPoly(mask, pts=[convex_hull.squeeze().astype(np.int32)], color=1)
I try these code in inferrence.py,but return with error: face_landmarks_detector has not been defined,can you show how to create this object with mediapipe?
It's a bit problematic for me to send the code. However, I can send the modifications. Take them as rough guidelines rather than exact. Read about how to use MediaPipe.
Install MediaPipe:
pip install mediapipe==0.10.0
Download face landmarker model and put it in the "weights" folder:
wget -O weights/face_landmarker_v2_with_blendshapes.task -q https://storage.googleapis.com/mediapipe-models/face_landmarker/face_landmarker/float16/1/face_landmarker.task
Add MediaPipe imports:
from mediapipe.tasks import python from mediapipe.tasks.python import vision import mediapipe as mp
Add face mask args:
parser.add_argument('--face_landmarks_detector_path', default='weights/face_landmarker_v2_with_blendshapes.task', type=str, help='Path to face landmarks detector') parser.add_argument('--with_face_mask', action='store_true', help='Blend output into original frame using a face mask rather than directly blending the face box. This prevents a lower resolution square artifact around lower face')
In main() in the loop starting with "for p, f, c in ..." Modify The line
f[y1:y2, x1:x2] = p
to:mask = face_mask_from_image(p, face_landmarks_detector) f[y1:y2, x1:x2] = f[y1:y2, x1:x2] * (1 - mask[..., None]) + p * mask[..., None] else: f[y1:y2, x1:x2] = p
Add face_mask_from_image function:
def face_mask_from_image(image, face_landmarks_detector): """ Calculate face mask from image. This is done by Args: image: numpy array of an image face_landmarks_detector: mediapipa face landmarks detector Returns: A uint8 numpy array with the same height and width of the input image, containing a binary mask of the face in the image """ # initialize mask mask = np.zeros((image.shape[0], image.shape[1]), dtype=np.uint8) # detect face landmarks mp_image = mp.Image(image_format=mp.ImageFormat.SRGB, data=image) detection = face_landmarks_detector.detect(mp_image) if len(detection.face_landmarks) == 0: # no face detected - set mask to all of the image mask[:] = 1 return mask # extract landmarks coordinates face_coords = np.array([[lm.x * image.shape[1], lm.y * image.shape[0]] for lm in detection.face_landmarks[0]]) # calculate convex hull from face coordinates convex_hull = cv2.convexHull(face_coords.astype(np.float32)) # apply convex hull to mask return cv2.fillPoly(mask, pts=[convex_hull.squeeze().astype(np.int32)], color=1)
mask = face_mask_from_image(p, face_landmarks_detector) f[y1:y2, x1:x2] = f[y1:y2, x1:x2] (1 - mask[..., None]) + p mask[..., None] else: f[y1:y2, x1:x2] = p This code is incomplete, since there is else, why is there no if in front?
It's a bit problematic for me to send the code. However, I can send the modifications. Take them as rough guidelines rather than exact. Read about how to use MediaPipe.
Install MediaPipe:
pip install mediapipe==0.10.0
Download face landmarker model and put it in the "weights" folder:
wget -O weights/face_landmarker_v2_with_blendshapes.task -q https://storage.googleapis.com/mediapipe-models/face_landmarker/face_landmarker/float16/1/face_landmarker.task
Add MediaPipe imports:
from mediapipe.tasks import python from mediapipe.tasks.python import vision import mediapipe as mp
Add face mask args:
parser.add_argument('--face_landmarks_detector_path', default='weights/face_landmarker_v2_with_blendshapes.task', type=str, help='Path to face landmarks detector') parser.add_argument('--with_face_mask', action='store_true', help='Blend output into original frame using a face mask rather than directly blending the face box. This prevents a lower resolution square artifact around lower face')
In main() in the loop starting with "for p, f, c in ..." Modify The line
f[y1:y2, x1:x2] = p
to:mask = face_mask_from_image(p, face_landmarks_detector) f[y1:y2, x1:x2] = f[y1:y2, x1:x2] * (1 - mask[..., None]) + p * mask[..., None] else: f[y1:y2, x1:x2] = p
Add face_mask_from_image function:
def face_mask_from_image(image, face_landmarks_detector): """ Calculate face mask from image. This is done by Args: image: numpy array of an image face_landmarks_detector: mediapipa face landmarks detector Returns: A uint8 numpy array with the same height and width of the input image, containing a binary mask of the face in the image """ # initialize mask mask = np.zeros((image.shape[0], image.shape[1]), dtype=np.uint8) # detect face landmarks mp_image = mp.Image(image_format=mp.ImageFormat.SRGB, data=image) detection = face_landmarks_detector.detect(mp_image) if len(detection.face_landmarks) == 0: # no face detected - set mask to all of the image mask[:] = 1 return mask # extract landmarks coordinates face_coords = np.array([[lm.x * image.shape[1], lm.y * image.shape[0]] for lm in detection.face_landmarks[0]]) # calculate convex hull from face coordinates convex_hull = cv2.convexHull(face_coords.astype(np.float32)) # apply convex hull to mask return cv2.fillPoly(mask, pts=[convex_hull.squeeze().astype(np.int32)], color=1)
thanks,it's work!
Hi. For those wondering that how to import face_landmarks_detector, here is the code: ` BaseOptions = mp.tasks.BaseOptions FaceLandmarker = mp.tasks.vision.FaceLandmarker FaceLandmarkerOptions = mp.tasks.vision.FaceLandmarkerOptions VisionRunningMode = mp.tasks.vision.RunningMode
options = FaceLandmarkerOptions( base_options=BaseOptions(model_asset_path=args.face_landmarks_detector_path), running_mode=VisionRunningMode.IMAGE)
ace_landmarks_detector = FaceLandmarker.create_from_options(options) ` But this won't necessarily solve the problem, because it will cause the face edge to be incompatible. The only solution may be to use high-res images to train.
Having a --resize_factor seems not to work, when my videos are already of 720P.
hit the same problem
Don't know if still useful, however i managed to make it work:
BaseOptions = mp.tasks.BaseOptions
FaceLandmarker = mp.tasks.vision.FaceLandmarker
FaceLandmarkerOptions = mp.tasks.vision.FaceLandmarkerOptions
VisionRunningMode = mp.tasks.vision.RunningMode
options = FaceLandmarkerOptions(
base_options=BaseOptions(model_asset_path=args.face_landmarks_detector_path),
running_mode=VisionRunningMode.IMAGE)
face_landmarks_detector = FaceLandmarker.create_from_options(options)
for p, f, c in zip(pred, frames, coords):
y1, y2, x1, x2 = c
p = cv2.resize(p.astype(np.uint8), (x2 - x1, y2 - y1))
if args.with_face_mask:
mask = face_mask_from_image(p, face_landmarks_detector)
f[y1:y2, x1:x2] = f[y1:y2, x1:x2] * (1 - mask[..., None]) + p * mask[..., None]
else:
f[y1:y2, x1:x2] = p
out.write(f)
for me it was really important to remove tqdm, also removing the import (on Windows) otherwise I had WinError 6 Handle not valid, I think something about multithreading, don't want to know...
Hope it helps ✌️
Dear author,
Thanks for sharing the excellent work.
I found that when using my personal video, there is a clear box region around the mouth in the output result, see as below:
What could be the reason of this, and could you please give me some instruction on how to solve it?
Many thanks for the help.
Best.