Rethinking Inductive Biases for Surface Normal Estimation
Official implementation of the paper
Rethinking Inductive Biases for Surface Normal Estimation \
CVPR 2024 [oral] \
Gwangbin Bae and Andrew J. Davison \
[paper.pdf]
[arXiv]
[youtube]
[project page]
Abstract
Despite the growing demand for accurate surface normal estimation models, existing methods use general-purpose dense prediction models, adopting the same inductive biases as other tasks. In this paper, we discuss the inductive biases needed for surface normal estimation and propose to (1) utilize the per-pixel ray direction and (2) encode the relationship between neighboring surface normals by learning their relative rotation. The proposed method can generate crisp — yet, piecewise smooth — predictions for challenging in-the-wild images of arbitrary resolution and aspect ratio. Compared to a recent ViT-based state-of-the-art model, our method shows a stronger generalization ability, despite being trained on an orders of magnitude smaller dataset.
Getting started
We provide the instructions in four steps (click "▸" to expand). For example, if you just want to test DSINE on some images, you can stop after Step 1. This would minimize the amount of installation/downloading.
Step 1. Test DSINE on some images (requires minimal dependencies)
Start by installing dependencies.
```
conda create --name DSINE python=3.10
conda activate DSINE
conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia
python -m pip install geffnet
```
Then, download the model weights from this link and save it under `projects/dsine/checkpoints/`. Note that it should maintain the same folder structure as the google drive. For example, `checkpoints/exp001_cvpr2024/dsine.pt` (in google drive) is our best model. It should be saved as `projects/dsine/checkpoints/exp001_cvpr2024/dsine.pt`. The corresponding config file is `projects/dsine/experiments/exp001_cvpr2024/dsine.txt`.
The models under `checkpoints/exp002_kappa/` (in google drive) are the ones that can also estimate uncertainty.
Then, move to the folder `projects/dsine/`, and run
```
python test_minimal.py ./experiments/exp001_cvpr2024/dsine.txt
```
This will generate predictions for the images under `projects/dsine/samples/img/`. The result will be saved under `projects/dsine/samples/output/`.
Our model assumes known camera intrinsics, but providing approximate intrinsics still gives good results. For some images in `projects/dsine/samples/img/`, the corresponding camera intrinsics (fx, fy, cx, cy - assuming perspective camera with no distortion) is provided as a `.txt` file. If such a file does not exist, the intrinsics will be approximated, by assuming $60^\circ$ field-of-view.
Step 2. Test DSINE on benchmark datasets & run a real-time demo
Install additional dependencies.
```
python -m pip install tensorboard
python -m pip install opencv-python
python -m pip install matplotlib
python -m pip install pyrealsense2 # needed only for demo using a realsense camera
python -m pip install vidgear # needed only for demo on YouTube videos
python -m pip install yt_dlp # needed only for demo on YouTube videos
python -m pip install mss # needed only for demo on screen capture
```
Download the evaluation datasets (`dsine_eval.zip`) from this link.
**NOTE:** By downloading the dataset, you are agreeing to the respective LICENSE of each dataset. The link to the dataset can be found in the respective `readme.txt`.
If you go to `projects/__init__.py`, there is a variable called `DATASET_DIR` and `EXPERIMENT_DIR`:
* `DATASET_DIR` is where your dataset should be stored. For example, the `dsine_eval` dataset (downloaded from the link above) should be saved under `DATASET_DIR/dsine_eval`. Update this variable.
* `EXPERIMENT_DIR` is where the experiments (e.g. model weights, log, etc) will be saved. Update this variable.
Then, move to the folder `projects/dsine/`, and run:
```python
# getting benchmark performance on the six evaluation datasets
python test.py ./experiments/exp001_cvpr2024/dsine.txt --mode benchmark
# getting benchmark performance on the six evaluation datasets (with visualization)
# it will be saved under EXPERIMENT_DIR/dsine/exp001_cvpr2024/dsine/test/
python test.py ./experiments/exp001_cvpr2024/dsine.txt --mode benchmark --visualize
# generate predictions for the images in `projects/dsine/samples/img/
python test.py ./experiments/exp001_cvpr2024/dsine.txt --mode samples
# measure the throughput (inference speed) on your device
python test.py ./experiments/exp001_cvpr2024/dsine.txt --mode throughput
```
You can also run a real-time demo by running:
```python
# captures your screen and makes prediction
python test.py ./experiments/exp001_cvpr2024/dsine.txt --mode screen
# demo using webcam
python test.py ./experiments/exp001_cvpr2024/dsine.txt --mode webcam
# demo using a realsense camera
python test.py ./experiments/exp001_cvpr2024/dsine.txt --mode rs
# demo on a Youtube video (replace with a different link)
python test.py ./experiments/exp001_cvpr2024/dsine.txt --mode https://www.youtube.com/watch?v=X-iEq8hWd6k
```
For each input option, there are some additional parameters. See `projects/dsine/test.py` for more information.
You can also try building your own real-time demo. Please see [this notebook](https://github.com/baegwangbin/DSINE/blob/main/notes/real_time_demo.ipynb) for more information.
Step 3. Train DSINE
In `projects/dsine/`, run:
```python
python train.py ./experiments/exp000_test/test.txt
```
And do `tensorboard --logdir EXPERIMENT_DIR/dsine/exp000_test/test/log` to open the tensorboard.
This will train the model on the train split of the NYUv2 dataset, which should be under `DATASET_DIR/dsine_eval/nyuv2/train/`. There are only 795 images here, and the performance will not be good. To get better results you need to:
**(1) Create a custom dataloader**
>We are checking if we can release the entire training dataset (~400GB). Before the release, you can try building your custom dataloader. You need to define a `get_sample(args, sample_path, info)` function and provide a data split in `data/datasets`. Check how they are defined/provided for other datasets. You also need to update `projects/baseline_normal/dataloader.py` so the newly defined `get_sample` function can be used.
**(2) Generate GT surface normals** (optional)
>In case your dataset does not come with ground truth surface normal maps, you can try generating them from the ground truth depth maps. Please see [this notebook](https://github.com/baegwangbin/DSINE/blob/main/notes/depth_to_normal.ipynb) for more information.
**(3) Customize data augmentation**
>In case you are using synthetic images, you need the right set of data augmentation functions to minimize the synthetic-to-real domain gap. We provide a wide range of augmentation functions, but the hyperparameters are not finetuned and you can potentially get better results by finetuning them. Please see [this notebook](https://github.com/baegwangbin/DSINE/blob/main/notes/visualize_augmentation.ipynb) for more information.
Step 4. Start your own surface normal estimation project
If you want to start your own surface normal estimation project, you can do so very easily.
First of all, have a look at `projects/baseline_normal`. This is a place where you can try different CNN architectures without worrying about the camera intrinsics and rotation estimation. You can try popular architectures like U-Net, and try different backbones. In this folder, you can run:
```python
python train.py ./experiments/exp000_test/test.txt
```
The project-specific `config` is defined in `projects/baseline_normal/config.py`. Default config, which is shared across all projects are in `projects/__init__.py`.
The dataloaders are in `projects/baseline_normal/dataloader.py`. We use the same dataloaders in `dsine` project, so we don't have `projects/dsine/dataloader.py`.
The losses are defined in `projects/baseline_normal/losses.py`. These are building blocks for your custom loss functions in your own project. For example, in the DSINE project, we produce a list of predictions and the loss is the weighted sum of the losses computed for each prediction. You can see how this is done in `projects/dsine/losses.py`.
You can start a new project by copying the folder `projects/dsine` to create `projects/NEW_PROJECT_NAME`. Then, update the `config.py` and `losses.py`.
Lastly, you can should `train.py` and `test.py`. For things that should be different in different projects, we made a note like following:
```python
#↓↓↓↓
#NOTE: forward pass
img = data_dict['img'].to(device)
intrins = data_dict['intrins'].to(device)
...
pred_list = model(img, intrins=intrins, mode='test')
norm_out = pred_list[-1]
#↑↑↑↑
```
Search for the arrows (↓↓↓↓/↑↑↑↑) to see where things should be modified in different projects.
The test commands above (e.g. for getting the benchmark performance & running real-time demo) should apply the same for all projects.
Additional instructions
If you want to make contributions to this repo, please make a pull request and add instructions in the following format.
Using torch hub to predict normal (contribution by hugoycj)
NOTE: the code below is deprecated and should be modified (as the folder structure has changed).
```
import torch
import cv2
import numpy as np
# Load the normal predictor model from torch hub
normal_predictor = torch.hub.load("hugoycj/DSINE-hub", "DSINE", trust_repo=True)
# Load the input image using OpenCV
image = cv2.imread(args.input, cv2.IMREAD_COLOR)
h, w = image.shape[:2]
# Use the model to infer the normal map from the input image
with torch.inference_mode():
normal = normal_predictor.infer_cv2(image)[0] # Output shape: (H, W, 3)
normal = (normal + 1) / 2 # Convert values to the range [0, 1]
# Convert the normal map to a displayable format
normal = (normal * 255).cpu().numpy().astype(np.uint8).transpose(1, 2, 0)
normal = cv2.cvtColor(normal, cv2.COLOR_RGB2BGR)
# Save the output normal map to a file
cv2.imwrite(args.output, normal)
```
If the network is unavailable to retrieve weights, you can use local weights for torch hub as shown below:
```
normal_predictor = torch.hub.load("hugoycj/DSINE-hub", "DSINE", local_file_path='./checkpoints/dsine.pt', trust_repo=True)
```
Generating ground truth surface normals
We provide the code used to generate the ground truth surface normals from ground truth depth maps. See this notebook for more information.
About the coordinate system
We use the right-handed coordinate system with (X, Y, Z) = (right, down, front). An important thing to note is that both the ground truth normals and our prediction are the outward normals. For example, in the case of a fronto-parallel wall facing the camera, the normals would be (0, 0, 1), not (0, 0, -1). If you instead need to use the inward normals, please do normals = -normals
.
Sharing your model weights
If you wish to share your model weights, please make a pull request by providing the corresponding config file and the link to the weights.
Citation
If you find our work useful in your research please consider citing our paper:
@inproceedings{bae2024dsine,
title = {Rethinking Inductive Biases for Surface Normal Estimation},
author = {Gwangbin Bae and Andrew J. Davison},
booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2024}
}
If you use the models that also estimate the uncertainty, please also cite the following paper, where we introduced the loss function:
@InProceedings{bae2021eesnu,
title = {Estimating and Exploiting the Aleatoric Uncertainty in Surface Normal Estimation}
author = {Gwangbin Bae and Ignas Budvytis and Roberto Cipolla},
booktitle = {International Conference on Computer Vision (ICCV)},
year = {2021}
}