Video matting has broad applications, from adding interesting effects to casually captured movies to assisting video production professionals. Matting with associated effects like shadows and reflections has also attracted increasing research activity, and methods like Omnimatte have been proposed to separate foreground objects of interest into their own layers. However, prior works represent video backgrounds as 2D image layers, limiting their capacity to express more complicated scenes, thus hindering application to real-world videos. In this paper, we propose a novel video matting method, F2B3, that combines 2D foreground layers and a 3D background model. The 2D layers preserve the details of the subjects, while the 3D background robustly reconstructs scenes in real-world videos. Extensive experiments demonstrate that our method reconstructs with better quality on various videos.
OmnimatteRF: Robust Omnimatte with 3D Background Modeling
Geng Lin, Chen Gao, Jia-Bin Huang, Changil Kim, Yipeng Wang, Matthias Zwicker, Ayush Saraf
in ICCV 2023
If you have a containerized environment, you can run our code with this image: logchan/matting:20221229.01
on docker hub. It is recommended that you mount three paths inside the container:
/code
for this repository/data
for video datasets (see data format)/output
for experiment output/home/user
for storing shell config and PyTorch cache; copy .bashrc
to this folder to use fish by defaultCheck here for an example docker-compose.yaml
.
You can setup a Python environment with these packages installed:
torch
torch-efficient-distloss
tinycudann
dataclasses-json
detectron2
hydra-core
kornia
lpips
scikit-image
tensorboard
tqdm
# for running RoDynRF
easydict
ConfigArgParse
Required software in PATH:
ffmpeg
colmap
(for pose estimation only)Download our synthetic and captured datasets from Google Drive.
The following data are needed to run our method:
rgb_1x
, input video sequence as image filesposes_bounds.npy
or transforms.json
, camera poses in the LLFF or NeRF Blender formatflow/flow
and flow/flow_backward
are forward and backward optical flows written with RAFT writeFlow
; flow/confidence
contains confidence maps generated by omnimattemasks/mask
, containing one or more subfolders, each providing a coarse mask sequence.
depth
, monocular depth estimation (required only if using depth loss)While all paths are configurable with command line arguments, the code by default recognizes the following structure:
/data/matting/wild/bouldering
├── colmap
│ └── poses_bounds.npy
├── depth
│ └── depth
│ └── 00000.npy
├── flow
│ ├── confidence
│ │ └── 0001.png
│ ├── flow
│ │ └── 00000.flo
│ └── flow_backward
│ └── 00000.flo
├── homography
│ └── homographies.npy
├── masks
│ └── mask
│ └── 00
│ └── 00000.png
└── rgb_1x
└── 00000.png
We also provide scripts for preparing all data required to run our pipeline, and for converting our data format to Omnimatte or Nerfies formats. See using your video for details.
We use hydra for configuring the pipeline, training parameters, and evaluation setups. The entrypoint files and predefined configurations are located in the workflows folder.
You can find the documented config structure in code files under core/config.
To make it easy to prepare data and run experiments, we have created a simple command line interface, ui/cli.py
. It requires some setup as it enforces the data organization shown above. See how to use it in Using the CLI.
If you can't use the CLI, it basically wraps the commands described below.
# Using CLI
python ./ui/cli.py train_ours wild/walk
python ./ui/cli.py train_ours wild/bouldering -- \
data_sources.llff_camera.scene_scale=0.2
# Invoke workflow directly
python workflows/train.py \
--config-name train_both \
output=/output/train/wild/walk/matting/basic-exp \
dataset.path=/data/matting/wild/walk \
dataset.scale=0.25 \
contraction=ndc
python workflows/train.py \
--config-name train_both \
output=/output/train/wild/bouldering/matting/basic-exp \
dataset.path=/data/matting/wild/bouldering \
dataset.scale=0.25 \
data_sources=[flow,mask,colmap] \
contraction=ndc \
data_sources.llff_camera.scene_scale=0.2
In the above command,
dataset.scale
sets the resolution scale of the images. The bouldering video is 1080p and training at 0.5x scale would require ~40GB of VRAM.data_sources
specifies which data folders (apart from images) should be loaded for training.
[flow,mask,{pose}]
, where pose
should be one of colmap
, blender
(for synthetic data), or rodynrf
(if pose is from RoDynRF). The default is [flow,mask,colmap]
.rodynrf
config uses the same npy file format as colmap
, but assumes that the file is stored under rodynrf/poses_bounds.npy
. It also disables some pose preprocessing steps.contraction
sets how rays should be contracted into a fixed volume for TensoRF. We use ndc
for synthetic and COLMAP-reconstructed poses, and mipnerf
for RoDynRF-predicted poses.data_sources.llff_camera.scene_scale
scales all camera origins to fit the scene in a smaller volume. In practice this prevents TensoRF from getting OOM errors for some videos.# Using CLI
python ./ui/cli.py \
train_ours \
wild/bouldering \
--use_depths \
-- \
fg_losses=[alpha_reg,brightness_reg,flow_recons,mask,recons,warped_alpha,bg_tv_reg,robust_depth_matching,bg_distortion] \
fg_losses.robust_depth_matching.config.alpha=0.1 \
fg_losses.bg_distortion.config.alpha=0.01 \
data_sources.llff_camera.scene_scale=0.2
# Invoke workflow directly
python workflows/train.py \
--config-name train_both \
output=/output/train/wild/bouldering/matting/exp-with-depths \
dataset.path=/data/matting/wild/bouldering \
dataset.scale=0.25 \
data_sources=[flow,mask,colmap,depths] \
contraction=ndc \
fg_losses=[alpha_reg,brightness_reg,flow_recons,mask,recons,warped_alpha,bg_tv_reg,robust_depth_matching,bg_distortion] \
fg_losses.robust_depth_matching.config.alpha=0.1 \
fg_losses.bg_distortion.config.alpha=0.01 \
data_sources.llff_camera.scene_scale=0.2
The configs robust_depth_matching
and bg_distortion
enables monocular depth supervision and distortion loss respectively.
By default, the evaluation script loads pipeline and dataset configurations from training:
# Using CLI
python ./ui/cli.py eval_ours wild/bouldering/exp-with-depths --step 15000
# Invoke workflow directly
python workflows/eval.py \
output=/output/train/wild/bouldering/matting/exp-with-depths/eval/15000 \
checkpoint=/output/train/wild/bouldering/matting/exp-with-depths/checkpoints/checkpoint_15000.pth
If you find some shadows captured in both foreground and background layers, it may be possible to obtain a clean background by training the TensoRF model from scratch, using the mask from the jointly-trained foreground.
The eval script generates fg_alpha
which is the combined alpha of foreground layers. You can train the background RF using:
# Using CLI
python ui/cli.py \
train_ours \
--config train_bg \
--name retrain_bg \
--mask /output/train/wild/walk/matting/basic-exp/eval/15000/fg_alpha \
wild/walk
# Invoke workflow directly
python workflows/train.py \
--config-name train_bg \
output=/output/train/wild/walk/retrain-bg \
dataset.path=/data/matting/wild/walk \
dataset.scale=0.25 \
data_sources=[mask,colmap] \
data_sources.mask.subpath=/output/train/wild/walk/matting/basic-exp/eval/15000/fg_alpha \
contraction=ndc
For any issues related to code and data, file an issue or email geng@cs.umd.edu.
@InProceedings{Lin_2023_ICCV,
author = {Geng Lin and Chen Gao and Jia-Bin Huang and Changil Kim and Yipeng Wang and Matthias Zwicker and Ayush Saraf},
title = {OmnimatteRF: Robust Omnimatte with 3D Background Modeling},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
month = {October},
year = {2023}
}
The code is available under the MIT license.
Our codebase contains code from MiDaS, omnimatte, RAFT, RoDynRF, and TensoRF. Their licenses can be found under the licenses folder.