PeterL1n / BackgroundMattingV2

Real-Time High-Resolution Background Matting
MIT License
6.85k stars 952 forks source link

What's the difference between model type mattingbase and mattingrefine #58

Closed bianxg closed 3 years ago

bianxg commented 3 years ago

I test inference_video.py with model type mattingbase and mattingrefine the video format is 720P When using mattingrefine, the iter speed about 4.57it/s When using mattingbase, the iter speed about 1.2s/it Why model type mattingrefine is much faster than model type mattingbase? Thanks.

python inference_video.py --model-type mattingbase \ --model-backbone mobilenetv2 \ --model-backbone-scale 0.25 \ --model-refine-mode sampling \ --model-refine-sample-pixels 80000 \ --model-checkpoint "./share/pytorch_mobilenetv2.pth" \ --video-src "./share/src.mp4" \ --video-bgr "./share/src.png" \ --output-dir "./output/" \ --device cpu \ --output-type com

PeterL1n commented 3 years ago

Please refer to our paper for the architecture.

MattingBase is the Base network, which is a fully convolutional network that operates on the entire image area. For low-resolution images such as 480x270, you can directly use MattingBase.

MattingRefine is the Base network + Refinement network. MattingRefine internally takes in the image, say 1920x1080, and downsample it to 480x270 and pass it through the base network to get to 480x270 result, then the refinement network upsamples it to 1920x1080 by only refining the edges.

So for 720p videos. You should use MattingRefine as it internally downsamples to a lower resolution to speed up the process. The backbone-scale defines the downsampling ratio. 0.25 means 1/4, so the 720p image will get downsampled 4x. You can also play around with that parameter to trade off speed and quality. 0.25 was intended for 1080p. 0.125 was intended for 4K. 720p could use something like 0.33.