This is a reimplementation of the paper "Seed, Expand and Constrain: Three Principles for Weakly-Supervised Image Segmentation"(Original Github).
Compared with the original Caffe version, this version includes all the training codes such as training code for foreground cues and background cues. Therefore new dataset can be used.
This version supports multi-gpu training, which is much faster than the original Caffe version.
Apart from VGG16 base network, Resnet50 is also provided as a backbone.
For performance (VOC12 validation), VGG16 version is a bit lower than the score reported in the paper (IOU: 50.2 vs 50.7), due to randomness. The Resnet50 version is much higher than the VGG16 version (IOU: 55.3).
The code is implemented in MXNet. Please go to the official website (HERE) for installation. Please make sure the MXNet is compiled with OpenCV support.
The other python dependences can be found in "dependencies.txt" and can be installed:
pip install -r dependencies.txt
There are two datasets, PASCAL VOC12(HERE) and SBD(HERE). Extract them and put them into folder "dataset", and then run:
python create_dataset.py
Download models pretrained on Image-net (HERE), extract the files and put them into folder "models".
In "cores.config.py", the base network can be changed by editing "conf.BASE_NET". The other parameters can also be tweaked.
The training process involves three steps: training fg cues, training bg cues and training SEC model, which are:
python train_bg_cues.py --gpus 0,1,2,3
python train_fg_cues.py --gpus 0,1,2,3
python train_SEC.py --gpus 0,1,2,3
The snapshots will be saved in folder "snapshots". To evaluate a snapshot, simply use (for example epoch 8):
python eval.py --gpu 0 --epoch 8
There are other flags:
--savemask save output masks
--crf use CRF as postprocessing
--flip also use flipped images in inference