Fixes + Run-One-Image demo

This pull request fixes 2 bugs: The first, is an invalid import within modelling/backbone where a non-existing file (vit_d646) is being imported. From context clues, I assumed it should be importing vit_grid instead. The second, is a problem on western computers. configs/common/diffusion.py contained a chinese comment, which caused a UnicodeDecodeError coming from detectron2 when attempting to lazy load the file. This is because it attempted to open the file with the cp1252 encoding, which does not support chinese characters.

This pull request also includes a script to perform inference on a single image out of the box. The arguments are as follows: --config-dir (configs/ViTS_1024.py) The path to the configuration file for the model --checkpoint-dir The path to the model file --image-dir (demo/retriever_rgb.png) The path to the source image to perform inference on --trimap-dir (demo/retriever_trimap.png) The path to the trimap for the source image --output-dir (demo/result.png) Path to output the resulting image to --device (cuda) Pytorch device to use, may be cpu if cuda not supported --sample-strategy (ddim10) Sampling strategy, number affects the step count

This script was modified from ViTMatte. The demo images also come from the ViTMatte repository.

Changes to the readme will most likely be needed.

YihanHu-2022 / DiffMatte

Fixes + Run-One-Image demo #1