Error while pre-processing and missing assets files

chakri1804 commented 4 months ago

Hi, Thanks for sharing this amazing repo :smile: I was trying to retrain the model with a couple of modifications. In the processes I faced a couple of hitches while trying to run the pre-processing code on the mentioned datasets Would help me a lot if you can guide me through them

Errors:

In apply_mediapipe_to_dataset.py, inside the def preprocess_sample() function, you used model_asset_path='assets/face_landmarker.task' but the file is not provided within the assets folder. Could you share a link from where I can download this / update the repo with this file ?
In apply_fan_to_dataset.py I noticed you os.walk() the root, store the paths and then in L36 we loop on those pairs In the process we were also storing .mp4 and .avi but trying to read them with cv2.imread() in L44. This would in the end throw None type has no shape attribuite error. How do I handle this case / video datasets ?

Generic Doubts:

Since LRS3 is not available anymore, I was trying to make it work for LRS2. There were 2 subsets in it - pretrain and main. Should I combine them together and then preprocess it ?
I have not used MEAD dataset before but from the config files, I noticed that the code uses MEAD_front and MEAD_sides But the configs do not have provision landmark paths for MEAD_sides. Are they already provided with the dataset ?
I noticed that the output flame params from the model just has 50 expression params. I wanted retrain the model so that it outputs 100 expr params instead. For this, is it sufficient to just modify in the config and retrain or should I replace the flame implementation as a whole ?

filby89 commented 4 months ago

Hey, thanks for your interest in SMIRK :)

Errors:

You need to download the mediapipe face model from here. Direct link: https://storage.googleapis.com/mediapipe-models/face_landmarker/face_landmarker/float16/latest/face_landmarker.task
You are right, this is an oversight and the script is not currently supporting videos. In the case of videos you should use

cv2.VideoCapture

get each frame individually, and apply the preprocessing. I will try to also upload a correct version of this file soon.

Generic Doubts:

LRS3-LRS2. I don't think this choice will affect the outcome in a significant way. You could even just use the main subset.
If you check in https://github.com/georgeretsi/smirk/blob/main/datasets/mead_sides_dataset.py#L88 during our preprocessing we stored the mediapipe landmarks for MEAD sides in the same folder as the videos, so we just load them from there. For the FAN landmarks we didn't have the time to extract these for MEAD sides due to an approaching deadline, so we just skipped this (and we ignore the missing landmarks using the flag in https://github.com/georgeretsi/smirk/blob/main/datasets/base_dataset.py#L208).
I just uploaded a commit. Now if you just change the configuration file it should work out of the box to train for 100 params :) Let me know if you encounter any issues.

chakri1804 commented 4 months ago

Thanks for the quick response @filby89

So essentially in the case of videos with N frames, should I be emulating what is being done in mediapipe processing i.e., get by frame lmks of shape [1, 68, 2], concat all of them into a numpy array of shape [N, 68, 2] and dump them by the appropriate name? Thanks for clarifying the MEAD sides bit as well

I'm still working my way through procuring the datasets and preprocessing them at a snail's pace. Will let you know if I encounter any hitches while retraining :smile:

filby89 commented 4 months ago

@chakri1804 Yes, this is the process exactly as you mentioned :)

chakri1804 commented 4 months ago

Hey @filby89

I noticed that face alignment pre-processor was not multi-threaded and would take ages on a dataset like lrs2 and mead So I went ahead and modified mediapipe preprocessor and copied over snippets from face alignment instead pasting the code here in case anyone needs it

PS: This code only visualises last frame of the video and dumps it in the path unlike mediapipe one which stitches a whole video with landmarks plotted. :smile: PPS: The code tends to accumulate thread memory. So play around with num_processes and maxtasksperchild params

from tqdm import tqdm
import numpy as np
import os 
import cv2
from ibug.face_detection import RetinaFacePredictor
from ibug.face_alignment import FANPredictor
import argparse
from ibug.face_alignment.utils import plot_landmarks
from multiprocessing import Pool

# Initialize the argument parser
parser = argparse.ArgumentParser(description='Process images/videos with https://github.com/hhj1897/face_alignment.')
parser.add_argument('--input_dir', type=str, required=True, help='Input directory path')
parser.add_argument('--output_dir', type=str, required=True, help='Output directory path')
parser.add_argument('--vis_dir', type=str, help='Directory to save visualizations')
parser.add_argument('--num_processes', type=int, default=10, help='Number of processes to use for processing')
args = parser.parse_args()

all_files = []
for root, _, files in os.walk(args.input_dir):
    for file_name in files:
        if file_name.lower().endswith(('.jpg', '.png', '.mp4', '.avi')):
            all_files.append((root, file_name))

def process_image(root, file_name, output_path, vis_path, face_detector, landmark_detector):
    image = cv2.imread(os.path.join(root, file_name))
    detected_faces = face_detector(image, rgb=False)
    landmarks, scores = landmark_detector(image, detected_faces, rgb=False)
    np.save(output_path, landmarks)
    for lmks, scs in zip(landmarks, scores):
        plot_landmarks(image, lmks, scs, threshold=0.2)
    if vis_path:
        os.makedirs(os.path.dirname(vis_path), exist_ok=True)
        cv2.imwrite(vis_path, image)

def process_video(root, file_name, output_path, vis_path, face_detector, landmark_detector):
    cap = cv2.VideoCapture(os.path.join(root, file_name))
    fps = cap.get(cv2.CAP_PROP_FPS)
    width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
    height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
    frame_landmarks = []
    frame_last = None

    while cap.isOpened():
        ret, frame = cap.read()
        if not ret:
            # plot landmarks on the last frame and break loop
            os.makedirs(os.path.dirname(vis_path), exist_ok=True)
            for lmks, scs in zip(landmarks, scores):
                plot_landmarks(frame_last, lmks, scs, threshold=0.2)
            cv2.imwrite(os.path.splitext(vis_path)[0]+'.jpg', frame_last)
            break
        detected_faces = face_detector(frame, rgb=False)
        landmarks, scores = landmark_detector(frame, detected_faces, rgb=False)
        frame_landmarks.append(landmarks)
        frame_last = frame
    cap.release()
    frame_landmarks = np.concatenate(frame_landmarks, 0)
    np.save(output_path, frame_landmarks)

def process_file(root, file_name, face_detector, landmark_detector):
    input_path = os.path.join(root, file_name)
    rel_path = os.path.relpath(input_path, args.input_dir)
    output_path = os.path.join(args.output_dir, os.path.splitext(rel_path)[0] + '.npy')
    vis_path = os.path.join(args.vis_dir, rel_path) if args.vis_dir else None
    os.makedirs(os.path.dirname(output_path), exist_ok=True)

    if file_name.lower().endswith(('.jpg', '.png')):
        process_image(root, file_name, output_path, vis_path, face_detector, landmark_detector)
    if file_name.lower().endswith(('.mp4', '.avi')):
        process_video(root, file_name, output_path, vis_path, face_detector, landmark_detector)

def process_sample(args):
    # Create a RetinaFace detector using Resnet50 backbone, with the confidence
    # threshold set to 0.8
    face_detector = RetinaFacePredictor(threshold=0.8, device='cuda:0', 
                                        model=RetinaFacePredictor.get_model('mobilenet0.25'))
    # Create a facial landmark detector
    landmark_detector = FANPredictor(device='cuda:0', model=FANPredictor.get_model('2dfan2_alt'))
    root, file_name = args
    process_file(root, file_name, face_detector, landmark_detector)

with Pool(args.num_processes, maxtasksperchild=50) as pool:
    list(tqdm(pool.imap(process_sample, all_files), total=len(all_files)))

filby89 commented 4 months ago

Hey @chakri1804 thank you very much for your comment and code :) This will be helpful for others too !

chakri1804 commented 4 months ago

Final couple of questions @filby89 Got to download the whole MEAD dataset finally.

From the looks of it, you've seperated mead into front and sides right. For example, let's consider this subset of MEAD data /media/chakri/Firecuda/M003/video/front/angry/level_1/001.mp4 I assuming I should just clump up all other angles of the camera into sides folder and preprocess it ? (If only some of the other camera angles are used, what are those ?)

And should I rearrange these files around someway ? For example just make the filenames M003_front_angry_level_1_001.mp4 and move it to a new folder named mead_front Or can I just preprocess them as is without removing the parent directories ? Will the dataloaders have trouble picking up long sub-directories properly was my doubt

Similar doubt with LRS2 as well. Assume the directory tree to be /media/chakri/lrs2_v1_ext/mvlrs_v1/(main, pretrain) I just pointed the preprocessing scripts directly to main folder and things ran fine Would it run fine if I point it directly to mvlrs_v1 so it includes the pretrain subset as well ?

TL:DR; If we point the config to a certain dir, are videos expected to be at a certain directory depth or is it fine ? It would be awesome if you can share a dirtree if the code expects one :smile:

filby89 commented 4 months ago

Hey, yeah we have them separated (no special reasons). We use all camera angles (not top and bottom view though, just the sides). The different datasets in some cases might be in different format like you mention (some are in recursive folders some are not), so I suggest that for each dataset you look into the corresponding get_dataset_X function. In this function for each dataset we collect all files, so if you want you can make it recursive there !

chakri1804 commented 4 months ago

Hey @filby89

Got the datasets pre-processed and placed them in the right paths. I got these missing files error while trying to run the pretraining step. Could you point me out in the right direction from where I can fetch them ? FileNotFoundError: [Errno 2] No such file or directory: 'assets/expression_templates_famos'

Also I noticed that in config_train.yaml there were BUPT_path and its related landmarks Is this something I need to download and preprocess as well or is there a way to skip it ? (I'm guessing it's not used anywhere since the related data fetching scripts are absent)

filby89 commented 4 months ago

Hey @chakri1804 The expression templates should be downloaded during running quickinstall.sh. If this didnt download then you can find it in: https://drive.google.com/file/d/1wEL7KPHw2kl5DxP0UAB3h9QcQLXk7BM

I believe you can completely skip BUPT - we did not use it in the end at all and these configs are not used.

chakri1804 commented 3 months ago

Hey @filby89

I completed the pretraining part with 100 expressions and it worked fine Just tried training step with 100 expr. Here's where I encountered the error

in smirk_trainer.py @ L222:L223

expression = self.load_random_template(num_expressions=self.config.arch.num_expression)
flame_feats['expression_params'][gids[2][i],:self.config.arch.num_expression] = (0.25 + 1.25 * torch.rand((1, 1)).to(self.config.device)) * torch.Tensor(expression).to(self.config.device)

Here I believe the famos numpy files you have provided just have 50 length expression vectors so the training step fails. Could you provide the right files so I can continue with the training step ?

chakri1804 commented 1 month ago

Hey @filby89

Could you provide a bit more direction on the above issue ? The pretraining part with 100 exp I could perform But to continue the rest of the training, I'd require the famos numpy files with 100exp params Could you point me in the direction from where I could generate these myself or if you have them on hand / generate them, provide those new asset files ?

georgeretsi / smirk

Error while pre-processing and missing assets files #3