Closed chakri1804 closed 2 weeks ago
Hey, thanks for your interest in SMIRK :)
Errors:
You need to download the mediapipe face model from here. Direct link: https://storage.googleapis.com/mediapipe-models/face_landmarker/face_landmarker/float16/latest/face_landmarker.task
You are right, this is an oversight and the script is not currently supporting videos. In the case of videos you should use
cv2.VideoCapture
get each frame individually, and apply the preprocessing. I will try to also upload a correct version of this file soon.
Generic Doubts:
https://github.com/georgeretsi/smirk/blob/main/datasets/mead_sides_dataset.py#L88
during our preprocessing we stored the mediapipe landmarks for MEAD sides in the same folder as the videos, so we just load them from there. For the FAN landmarks we didn't have the time to extract these for MEAD sides due to an approaching deadline, so we just skipped this (and we ignore the missing landmarks using the flag in https://github.com/georgeretsi/smirk/blob/main/datasets/base_dataset.py#L208).Thanks for the quick response @filby89
So essentially in the case of videos with N frames, should I be emulating what is being done in mediapipe processing i.e., get by frame lmks of shape [1, 68, 2], concat all of them into a numpy array of shape [N, 68, 2] and dump them by the appropriate name? Thanks for clarifying the MEAD sides bit as well
I'm still working my way through procuring the datasets and preprocessing them at a snail's pace. Will let you know if I encounter any hitches while retraining :smile:
@chakri1804 Yes, this is the process exactly as you mentioned :)
Hey @filby89
I noticed that face alignment pre-processor was not multi-threaded and would take ages on a dataset like lrs2 and mead So I went ahead and modified mediapipe preprocessor and copied over snippets from face alignment instead pasting the code here in case anyone needs it
PS: This code only visualises last frame of the video and dumps it in the path unlike mediapipe one which stitches a whole video with landmarks plotted. :smile:
PPS: The code tends to accumulate thread memory. So play around with num_processes
and maxtasksperchild
params
from tqdm import tqdm
import numpy as np
import os
import cv2
from ibug.face_detection import RetinaFacePredictor
from ibug.face_alignment import FANPredictor
import argparse
from ibug.face_alignment.utils import plot_landmarks
from multiprocessing import Pool
# Initialize the argument parser
parser = argparse.ArgumentParser(description='Process images/videos with https://github.com/hhj1897/face_alignment.')
parser.add_argument('--input_dir', type=str, required=True, help='Input directory path')
parser.add_argument('--output_dir', type=str, required=True, help='Output directory path')
parser.add_argument('--vis_dir', type=str, help='Directory to save visualizations')
parser.add_argument('--num_processes', type=int, default=10, help='Number of processes to use for processing')
args = parser.parse_args()
all_files = []
for root, _, files in os.walk(args.input_dir):
for file_name in files:
if file_name.lower().endswith(('.jpg', '.png', '.mp4', '.avi')):
all_files.append((root, file_name))
def process_image(root, file_name, output_path, vis_path, face_detector, landmark_detector):
image = cv2.imread(os.path.join(root, file_name))
detected_faces = face_detector(image, rgb=False)
landmarks, scores = landmark_detector(image, detected_faces, rgb=False)
np.save(output_path, landmarks)
for lmks, scs in zip(landmarks, scores):
plot_landmarks(image, lmks, scs, threshold=0.2)
if vis_path:
os.makedirs(os.path.dirname(vis_path), exist_ok=True)
cv2.imwrite(vis_path, image)
def process_video(root, file_name, output_path, vis_path, face_detector, landmark_detector):
cap = cv2.VideoCapture(os.path.join(root, file_name))
fps = cap.get(cv2.CAP_PROP_FPS)
width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
frame_landmarks = []
frame_last = None
while cap.isOpened():
ret, frame = cap.read()
if not ret:
# plot landmarks on the last frame and break loop
os.makedirs(os.path.dirname(vis_path), exist_ok=True)
for lmks, scs in zip(landmarks, scores):
plot_landmarks(frame_last, lmks, scs, threshold=0.2)
cv2.imwrite(os.path.splitext(vis_path)[0]+'.jpg', frame_last)
break
detected_faces = face_detector(frame, rgb=False)
landmarks, scores = landmark_detector(frame, detected_faces, rgb=False)
frame_landmarks.append(landmarks)
frame_last = frame
cap.release()
frame_landmarks = np.concatenate(frame_landmarks, 0)
np.save(output_path, frame_landmarks)
def process_file(root, file_name, face_detector, landmark_detector):
input_path = os.path.join(root, file_name)
rel_path = os.path.relpath(input_path, args.input_dir)
output_path = os.path.join(args.output_dir, os.path.splitext(rel_path)[0] + '.npy')
vis_path = os.path.join(args.vis_dir, rel_path) if args.vis_dir else None
os.makedirs(os.path.dirname(output_path), exist_ok=True)
if file_name.lower().endswith(('.jpg', '.png')):
process_image(root, file_name, output_path, vis_path, face_detector, landmark_detector)
if file_name.lower().endswith(('.mp4', '.avi')):
process_video(root, file_name, output_path, vis_path, face_detector, landmark_detector)
def process_sample(args):
# Create a RetinaFace detector using Resnet50 backbone, with the confidence
# threshold set to 0.8
face_detector = RetinaFacePredictor(threshold=0.8, device='cuda:0',
model=RetinaFacePredictor.get_model('mobilenet0.25'))
# Create a facial landmark detector
landmark_detector = FANPredictor(device='cuda:0', model=FANPredictor.get_model('2dfan2_alt'))
root, file_name = args
process_file(root, file_name, face_detector, landmark_detector)
with Pool(args.num_processes, maxtasksperchild=50) as pool:
list(tqdm(pool.imap(process_sample, all_files), total=len(all_files)))
Hey @chakri1804 thank you very much for your comment and code :) This will be helpful for others too !
Final couple of questions @filby89 Got to download the whole MEAD dataset finally.
From the looks of it, you've seperated mead into front
and sides
right.
For example, let's consider this subset of MEAD data
/media/chakri/Firecuda/M003/video/front/angry/level_1/001.mp4
I assuming I should just clump up all other angles of the camera into sides
folder and preprocess it ?
(If only some of the other camera angles are used, what are those ?)
And should I rearrange these files around someway ?
For example just make the filenames M003_front_angry_level_1_001.mp4
and move it to a new folder named mead_front
Or can I just preprocess them as is without removing the parent directories ?
Will the dataloaders have trouble picking up long sub-directories properly was my doubt
Similar doubt with LRS2 as well.
Assume the directory tree to be /media/chakri/lrs2_v1_ext/mvlrs_v1/(main, pretrain)
I just pointed the preprocessing scripts directly to main
folder and things ran fine
Would it run fine if I point it directly to mvlrs_v1
so it includes the pretrain
subset as well ?
TL:DR; If we point the config to a certain dir, are videos expected to be at a certain directory depth or is it fine ? It would be awesome if you can share a dirtree if the code expects one :smile:
Hey,
yeah we have them separated (no special reasons). We use all camera angles (not top and bottom view though, just the sides). The different datasets in some cases might be in different format like you mention (some are in recursive folders some are not), so I suggest that for each dataset you look into the corresponding get_dataset_X
function. In this function for each dataset we collect all files, so if you want you can make it recursive there !
Hey @filby89
Got the datasets pre-processed and placed them in the right paths.
I got these missing files error while trying to run the pretraining step.
Could you point me out in the right direction from where I can fetch them ?
FileNotFoundError: [Errno 2] No such file or directory: 'assets/expression_templates_famos'
Also I noticed that in config_train.yaml
there were BUPT_path
and its related landmarks
Is this something I need to download and preprocess as well or is there a way to skip it ?
(I'm guessing it's not used anywhere since the related data fetching scripts are absent)
Hey @chakri1804 The expression templates should be downloaded during running quickinstall.sh. If this didnt download then you can find it in: https://drive.google.com/file/d/1wEL7KPHw2kl5DxP0UAB3h9QcQLXk7BM
I believe you can completely skip BUPT - we did not use it in the end at all and these configs are not used.
Hey @filby89
I completed the pretraining part with 100 expressions and it worked fine Just tried training step with 100 expr. Here's where I encountered the error
in smirk_trainer.py
@ L222:L223
expression = self.load_random_template(num_expressions=self.config.arch.num_expression)
flame_feats['expression_params'][gids[2][i],:self.config.arch.num_expression] = (0.25 + 1.25 * torch.rand((1, 1)).to(self.config.device)) * torch.Tensor(expression).to(self.config.device)
Here I believe the famos numpy files you have provided just have 50 length expression vectors so the training step fails. Could you provide the right files so I can continue with the training step ?
Hey @filby89
Could you provide a bit more direction on the above issue ? The pretraining part with 100 exp I could perform But to continue the rest of the training, I'd require the famos numpy files with 100exp params Could you point me in the direction from where I could generate these myself or if you have them on hand / generate them, provide those new asset files ?
Hi, Thanks for sharing this amazing repo :smile: I was trying to retrain the model with a couple of modifications. In the processes I faced a couple of hitches while trying to run the pre-processing code on the mentioned datasets Would help me a lot if you can guide me through them
Errors:
apply_mediapipe_to_dataset.py
, inside thedef preprocess_sample()
function, you usedmodel_asset_path='assets/face_landmarker.task'
but the file is not provided within the assets folder. Could you share a link from where I can download this / update the repo with this file ?apply_fan_to_dataset.py
I noticed youos.walk()
the root, store the paths and then in L36 we loop on those pairs In the process we were also storing.mp4
and.avi
but trying to read them withcv2.imread()
in L44. This would in the end throwNone type has no shape attribuite
error. How do I handle this case / video datasets ?Generic Doubts:
MEAD_front
andMEAD_sides
But the configs do not have provision landmark paths forMEAD_sides
. Are they already provided with the dataset ?