Repository of the 3DI method for 3D face reconstruction via 3DMM fitting. The implementation is based on CUDA programming and therefore requires an NVIDIA GPU. Below we explain the how to install and run this implementation.
Models
Software
The following python packages are also needed, but these can be installed by following the instructions in Section 2 of Installation below.
Download the code by running the commands below
wget https://github.com/Computational-Psychiatry/3DI/archive/refs/tags/v0.2.0.zip
unzip v0.2.0.zip
cd 3DI-0.2.0
and compile the CUDA code as below
cd build
chmod +x builder.sh
./builder.sh
Install the necessary packages via pip. It is advised to use a virtual environment by running and update pip
cd build
python3 -m venv env
source env/bin/activate
pip install --upgrade pip
The necessary packages can simply be installed by running.
pip install -r requirements.txt
Make sure that you downloaded the Basel Face Model (01_MorphableModel.mat
) and the Expression Model (Exp_Pca.bin
) as highlighted in the Requirements section above. Then, copy these model files into the build/models/raw
directory. Specifically, these files should be in the following locations:
build/models/raw/01_MorphableModel.mat
build/models/raw/Exp_Pca.bin
Then run the following python script to adapt these models to the 3DI code:
cd build/models
python3 prepare_BFM.py
Also, you need to unpack the landmark models etc. in the same directory
tar -xvzf lmodels.tar.gz
Go to the build
directory and, if you used a virtual environment, activate it by running source env/bin/activate
. Then, the following is an example command (will produce visuals as well)
python process_video.py ./output testdata/elaine.mp4
The produced files are in the subdirectory created under the output
directory. The file with expression parameters has the extension .expressions_smooth
and the pose parameters have the extension .poses_smooth
. The script above takes approximately 6 minutes to run, and this includes the production of the visualization parameters as well as temporal smoothing. Some tips to reduce total processing time:
--produce_3Drec_videos=False
.--smooth_pose_exp=False
--cfgid=2
to use the 2nd configuration file, which is faster although results will include more jitterIf you want to compute local expression basis coefficients (experimental), you can run:
python process_video.py ./output testdata/elaine.mp4 --compute_local_exp_coeffs=True
More parameters can be seen by running
python process_video.py --help
The process_video.py
script does a series of pre- and post-processing for reconstruction (details are in the section below). It is important to know that we first estimate the identity parameters of the subject in the video, by using a small subset of the video frames, and then we compute pose and expression coefficients at every frame. Thus, the identity parameters are held common throughout the video.
The process_video.py
script does a series of processes on the video. Specifically, it does the following steps in this order:
The first four steps are visualized below; the blue text indicates the extension of the corresponding files
The 2D landmarks estimated by 3DI are also produced optionally based on the files produced above.
Below are the extensions some of the output files provided by 3DI video analysis software:
.expressions_smooth
: A text file that contains all the expression coefficients of the video. That is, the file contains a Tx79
matrix, where the t
th row contains the 79 expression coefficients of the expression (PCA) model.poses_smooth
: A text file that contains all the poses coefficients of the video. The file contains a Tx9
matrix, where the first 3 columns contain the the 3D translation (tx, ty, tz)
for all the T
frames of the video and the last 3 columns of the matrix contain the rotation (yaw, pitch, roll
) for all the T
frames..local_exp_coeffs.*
(requires --compute_local_exp_coeffs=True
): Localized expression coefficients for the video; a text file that contains TxK
entries, where K
is the number of coefficients of the basis that is used to compute the expressions. See video here for an illustration of to what each coefficient corresponds to (e.g., the 25th coefficient in this text file indicates activation in the 25th basis component in the video)..2Dlmks
: A text file with the 51 landmarks corresponding to the inner face (see below), as predicted by 3DI. The file contains a matrix of size Tx102
, where each row is of the format: x0 y0 x1 y1 ... x51 y50
..canonicalized_lmks
: A text file with the 51 landmarks corresponding to the inner face (see below), after removing identity and pose variation. The file contains a matrix of size Tx153
, where each row is of the format: x0 y0 z0 x1 y1 z1 ... x50 y50 z50
.
The following are the relative IDs of the landmarks corresponding to each facial feature:
{'lb': [0, 1, 2, 3, 4], # left brow
'rb': [5, 6, 7, 8, 9], # right brow
'no': [10, 11, 12, 13, 14, 15, 16, 17, 18], # nose
'le': [19, 20, 21, 22, 23, 24], # left eye
're': [25, 26, 27, 28, 29, 30], # right eye
'ul': [31, 32, 33, 34, 35, 36, 43, 44, 45, 46], # upper lip
'll': [37, 38, 39, 40, 41, 42, 47, 48, 49, 50]} # lower lip
The processing of a video has a number of steps (see Running the code), and the table below lists the computation time for each of these. We provide computation time for two different configuration files (see cfgid
above). The default configuration file (cfgid=1
) leads to significantly less jitter in the results but to also longer processing times, whereas the second one (cfgid=2
) works faster but yields more jittery videos.
Note that the main script for videos (process_video.py
) includes a number of optional steps like post smoothing and visualization. These can be turned off using the parameters outlined at the section Running the code.
Config. 1 | Config. 2 | Time unit | |
---|---|---|---|
Face detection† | 18.83 | ms per frame | |
Landmark detection† | 137.92 | ms per frame | |
Identity learning† | 95542 | 62043 | ms per video |
3D reconstruction† | 331.72 | 71.27 | ms per frame |
Smoothing | 82.84 | ms per frame | |
Production of 3D reconstruction videos | 199.35 | ms per frame | |
Production of 2D landmark videos | 32.70 | ms per frame |
†Required step
The performance of 3DI is expected to improve if one incorporates the matrix of the camera that was used to record the data into the reconstruction process. We provide a calibration procedure, outlined below. (The procedure is for a camera of a MacBook Pro 21, you may replace the strings macbookpro21
with your camera model wherever applicable.)
./build/models/cameras/pattern.png
)build/calibration/videos
:
mkdir build/calibration/videos
build/calibration/videos
. In the rest of this tutorial, we assume that the video file is at the following location: build/calibration/videos/macbookpro21.mp4
build
directory: cd build
mkdir calibration/videos/frames
mkdir calibration/videos/frames/macbookpro21
ffmpeg
command:
ffmpeg -i calibration/videos/macbookpro21.mp4 -r 1 calibration/videos/frames/macbookpro21/frame%04d.png
calibration/videos/frames/macbookpro21/
. Remove any images where there is too much motion blur or the checkerboard is not fully in the frame. build
directory:
./calibrate_camera "calibration/videos/frames/macbookpro21/frame*.png" "models/cameras/macbookpro21.txt"
imread
's within the camera.h
file that cause the need for GUI and re-compile the code)build/models/cameras/macbookpro21.txt
.Once you obtained the calibration matrix, it can be incorporated into the reconstruction process by using two additional command line arguments, namely camera_param
and undistort
. The former argument be set to the camera calibration file and the latter to 1
, as shown below:
python process_video.py ./output testdata/elaine.mp4 --camera_param=./models/cameras/macbookpro21.txt --undistort=1
(This command is for illustration only, since the elaine.mp4
video was clearly not recorded with a MacBook Pro.)