My implementation of BiSeNetV1 and BiSeNetV2.
mIOUs and fps on cityscapes val set: | none | ss | ssc | msf | mscf | fps(fp32/fp16/int8) | link |
---|---|---|---|---|---|---|---|
bisenetv1 | 75.44 | 76.94 | 77.45 | 78.86 | 112/239/435 | download | |
bisenetv2 | 74.95 | 75.58 | 76.53 | 77.08 | 103/161/198 | download |
mIOUs on cocostuff val2017 set: | none | ss | ssc | msf | mscf | link |
---|---|---|---|---|---|---|
bisenetv1 | 31.49 | 31.42 | 32.46 | 32.55 | download | |
bisenetv2 | 30.49 | 30.55 | 31.81 | 31.73 | download |
mIOUs on ade20k val set: | none | ss | ssc | msf | mscf | link |
---|---|---|---|---|---|---|
bisenetv1 | 36.15 | 36.04 | 37.27 | 36.58 | download | |
bisenetv2 | 32.53 | 32.43 | 33.23 | 31.72 | download |
Tips:
ss means single scale evaluation, ssc means single scale crop evaluation, msf means multi-scale evaluation with flip augment, and mscf means multi-scale crop evaluation with flip evaluation. The eval scales and crop size of multi-scales evaluation can be found in configs.
The fps is tested in different way from the paper. For more information, please see here.
The authors of bisenetv2 used cocostuff-10k, while I used cocostuff-123k(do not know how to say, just same 118k train and 5k val images as object detection). Thus the results maybe different from paper.
The authors did not report results on ade20k, thus there is no official training settings, here I simply provide a "make it work" result. Maybe the results on ade20k can be boosted with better settings.
The model has a big variance, which means that the results of training for many times would vary within a relatively big margin. For example, if you train bisenetv2 on cityscapes for many times, you will observe that the result of ss evaluation of bisenetv2 varies between 73.1-75.1.
tensorrt
You can go to tensorrt for details.
ncnn
You can go to ncnn for details.
openvino
You can go to openvino for details.
tis
Triton Inference Server(TIS) provides a service solution of deployment. You can go to tis for details.
My platform is like this:
With a pretrained weight, you can run inference on an single image like this:
$ python tools/demo.py --config configs/bisenetv2_city.py --weight-path /path/to/your/weights.pth --img-path ./example.png
This would run inference on the image and save the result image to ./res.jpg
.
Or you can run inference on a video like this:
$ python tools/demo_video.py --config configs/bisenetv2_coco.py --weight-path res/model_final.pth --input ./video.mp4 --output res.mp4
This would generate segmentation file as res.mp4
. If you want to read from camera, you can set --input camera_id
rather than input ./video.mp4
.
1.cityscapes
Register and download the dataset from the official website. Then decompress them into the datasets/cityscapes
directory:
$ mv /path/to/leftImg8bit_trainvaltest.zip datasets/cityscapes
$ mv /path/to/gtFine_trainvaltest.zip datasets/cityscapes
$ cd datasets/cityscapes
$ unzip leftImg8bit_trainvaltest.zip
$ unzip gtFine_trainvaltest.zip
2.cocostuff
Download train2017.zip
, val2017.zip
and stuffthingmaps_trainval2017.zip
split from official website. Then do as following:
$ unzip train2017.zip
$ unzip val2017.zip
$ mv train2017/ /path/to/BiSeNet/datasets/coco/images
$ mv val2017/ /path/to/BiSeNet/datasets/coco/images
$ unzip stuffthingmaps_trainval2017.zip
$ mv train2017/ /path/to/BiSeNet/datasets/coco/labels
$ mv val2017/ /path/to/BiSeNet/datasets/coco/labels
$ cd /path/to/BiSeNet
$ python tools/gen_dataset_annos.py --dataset coco
3.ade20k
Download ADEChallengeData2016.zip
from this website and unzip it. Then we can move the uncompressed folders to datasets/ade20k
, and generate the txt files with the script I prepared for you:
$ unzip ADEChallengeData2016.zip
$ mv ADEChallengeData2016/images /path/to/BiSeNet/datasets/ade20k/
$ mv ADEChallengeData2016/annotations /path/to/BiSeNet/datasets/ade20k/
$ python tools/gen_dataset_annos.py --ade20k
4.custom dataset
If you want to train on your own dataset, you should generate annotation files first with the format like this:
munster_000002_000019_leftImg8bit.png,munster_000002_000019_gtFine_labelIds.png
frankfurt_000001_079206_leftImg8bit.png,frankfurt_000001_079206_gtFine_labelIds.png
...
Each line is a pair of training sample and ground truth image path, which are separated by a single comma ,
.
I recommand you to check the information of your dataset with the script:
$ python tools/check_dataset_info.py --im_root /path/to/your/data_root --im_anns /path/to/your/anno_file
This will print some of the information of your dataset.
Then you need to change the field of im_root
and train/val_im_anns
in the config file. I prepared a demo config file for you named bisenet_customer.py
. You can start from this conig file.
Training commands I used to train the models can be found in here.
Note:
bisenetv2
has fewer flops, it requires much more training iterations. The the training time of bisenetv1
is shorter.You can also load the trained model weights and finetune from it, like this:
$ export CUDA_VISIBLE_DEVICES=0,1
$ torchrun --nproc_per_node=2 tools/train_amp.py --finetune-from ./res/model_final.pth --config ./configs/bisenetv2_city.py # or bisenetv1
You can also evaluate a trained model like this:
$ python tools/evaluate.py --config configs/bisenetv1_city.py --weight-path /path/to/your/weight.pth
or you can use multi gpus:
$ torchrun --nproc_per_node=2 tools/evaluate.py --config configs/bisenetv1_city.py --weight-path /path/to/your/weight.pth
old
directory for original implementation if you need, though I believe you will not need it.