FlowFormer: A Transformer Architecture for Optical Flow
Zhaoyang Huang*, Xiaoyu Shi*, Chao Zhang, Qiang Wang, Ka Chun Cheung, Hongwei Qin, Jifeng Dai, Hongsheng Li
ECCV 2022
Our FlowFormer++ and VideoFlow are accepted by CVPR and ICCV, which ranks 2nd and 1st on the Sintel benchmark! Please also refer to our FlowFormer++ and VideoFlow.
Similar to RAFT, to evaluate/train FlowFormer, you will need to download the required datasets.
By default datasets.py
will search for the datasets in these locations. You can create symbolic links to wherever the datasets were downloaded in the datasets
folder
├── datasets
├── Sintel
├── test
├── training
├── KITTI
├── testing
├── training
├── devkit
├── FlyingChairs_release
├── data
├── FlyingThings3D
├── frames_cleanpass
├── frames_finalpass
├── optical_flow
conda create --name flowformer
conda activate flowformer
conda install pytorch=1.6.0 torchvision=0.7.0 cudatoolkit=10.1 matplotlib tensorboard scipy opencv -c pytorch
pip install yacs loguru einops timm==0.4.12 imageio
The script will load the config according to the training stage. The trained model will be saved in a directory in logs
and checkpoints
. For example, the following script will load the config configs/default.py
. The trained model will be saved as logs/xxxx/final
and checkpoints/chairs.pth
.
python -u train_FlowFormer.py --name chairs --stage chairs --validation chairs
To finish the entire training schedule, you can run:
./run_train.sh
We provide models trained in the four stages. The default path of the models for evaluation is:
├── checkpoints
├── chairs.pth
├── things.pth
├── sintel.pth
├── kitti.pth
├── flowformer-small.pth
├── things_kitti.pth
flowformer-small.pth is a small version of our flowformer. things_kitti.pth is the FlowFormer# introduced in our supplementary, used for KITTI training set evaluation.
The model to be evaluated is assigned by the _CN.model
in the config file.
Evaluating the model on the Sintel training set and the KITTI training set. The corresponding config file is configs/things_eval.py
.
# with tiling technique
python evaluate_FlowFormer_tile.py --eval sintel_validation
python evaluate_FlowFormer_tile.py --eval kitti_validation --model checkpoints/things_kitti.pth
# without tiling technique
python evaluate_FlowFormer.py --dataset sintel
with tile | w/o tile | |
---|---|---|
clean | 0.94 | 1.01 |
final | 2.33 | 2.40 |
Evaluating the small version model. The corresponding config file is configs/small_things_eval.py
.
# with tiling technique
python evaluate_FlowFormer_tile.py --eval sintel_validation --small
# without tiling technique
python evaluate_FlowFormer.py --dataset sintel --small
with tile | w/o tile | |
---|---|---|
clean | 1.21 | 1.32 |
final | 2.61 | 2.68 |
Generating the submission for the Sintel and KITTI benchmarks. The corresponding config file is configs/submission.py
.
python evaluate_FlowFormer_tile.py --eval sintel_submission
python evaluate_FlowFormer_tile.py --eval kitti_submission
Visualizing the sintel dataset:
python visualize_flow.py --eval_type sintel --keep_size
Visualizing an image sequence extracted from a video:
python visualize_flow.py --eval_type seq
The default image sequence format is:
├── demo_data
├── mihoyo
├── 000001.png
├── 000002.png
├── 000003.png
.
.
.
├── 001000.png
FlowFormer is released under the Apache License
@article{huang2022flowformer,
title={{FlowFormer}: A Transformer Architecture for Optical Flow},
author={Huang, Zhaoyang and Shi, Xiaoyu and Zhang, Chao and Wang, Qiang and Cheung, Ka Chun and Qin, Hongwei and Dai, Jifeng and Li, Hongsheng},
journal={{ECCV}},
year={2022}
}
@inproceedings{shi2023flowformer++,
title={Flowformer++: Masked cost volume autoencoding for pretraining optical flow estimation},
author={Shi, Xiaoyu and Huang, Zhaoyang and Li, Dasong and Zhang, Manyuan and Cheung, Ka Chun and See, Simon and Qin, Hongwei and Dai, Jifeng and Li, Hongsheng},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={1599--1610},
year={2023}
}
@article{huang2023flowformer,
title={FlowFormer: A Transformer Architecture and Its Masked Cost Volume Autoencoding for Optical Flow},
author={Huang, Zhaoyang and Shi, Xiaoyu and Zhang, Chao and Wang, Qiang and Li, Yijin and Qin, Hongwei and Dai, Jifeng and Wang, Xiaogang and Li, Hongsheng},
journal={arXiv preprint arXiv:2306.05442},
year={2023}
}
In this project, we use parts of codes in: