A Proposal for Inferencing High-Resolution Images with limited gpu vram less than 6GB.

Kiteretsu77 / APISR

APISR: Anime Production Inspired Real-World Anime Super-Resolution (CVPR 2024)

GNU General Public License v3.0

752 stars 51 forks source link

We can split the high-resolution image into multiple fixed size patches without overlap, then do inference on each patch, and finally merge the upscaled patches to obtain the full high-resolution image. I have already implemented this, and it is indeed feasible for enabling low VRAM GPUs like RTX3060 Laptop with 6GB VRM to upscale 1080P images. Notably, it seems to have no apparent negative effect on the quality of the upscaled image. The motivation from vision transformer and your paper, in vision transformer the image is split into multiple patches for tokenization, and in your paper actually train proportion of high resolution image instead of the whole image. Moreover, I suppose this apporach can also work for accelerating inference with multiple GPUs.

Kiteretsu77 / APISR

A Proposal for Inferencing High-Resolution Images with limited gpu vram less than 6GB. #13