Yijunmaverick / UniversalStyleTransfer

The source code of NIPS17 'Universal Style Transfer via Feature Transforms'.
MIT License
598 stars 91 forks source link

GPU memory requirements #2

Open okdewit opened 6 years ago

okdewit commented 6 years ago

Hey!

I've tried running this on a single GPU with 4GB DDR, but I get:

cuda runtime error (2) : out of memory at ~/torch/extra/cutorch/lib/THC/generic/THCStorage.cu:66

Before I break open my PC to install more cards, do you have a rough estimate what the GPU memory requirements are?

Multiboxer commented 6 years ago

Hey okdewit did you ever solve this? I'm having the same problem with 8GB of GPU memory.

okdewit commented 6 years ago

@Multiboxer Well, CPU mode using "normal" RAM solved it. (-gpu -1). You're right though, it's not really solved yet, I think some estimated system requirements in the readme could be a useful addition.

I also keep running out of normal memory when trying to render high resolution images on the CPU, but I think that has something to do with luajit limits.

Yijunmaverick commented 6 years ago

@okdewit @Multiboxer

Thanks for your suggestions on estimating the memory. High-resolution is always a challenging issue in deep models.

To run my code on GPUs with small memory, you need to reduce the image size, i.e., the parameter '-contentSize' and '-styleSize' (as shown below). I test my code on a GPU with 12GB memory and the biggest size I can run is around 900.

th test_wct.lua  -contentSize 256 -styleSize 256
okdewit commented 6 years ago

@Yijunmaverick Good to know! Could torch/tds help with the memory limit when rendering on CPU? The speed and memory usage on a Ryzen 7 with 64GB DDR, with at sizes over 1000 is very much acceptable, but it still runs into the 32-bit Luajit limit. (https://kvitajakub.github.io/2016/03/08/luajit-memory-limitations/)

taesiri commented 6 years ago

Hello, Thanks for your awesome Paper and work.

I'm just curious, is this a problem with Torch? I've tested multiple pairs of Image/styles in both Torch and Tensorflow implementations. Tensorflow literally has no problem dealing with high-resolution images (both style and content) on my GTX 1080, but Torch is unable to produce anything with contentSize above 748.

Disclaimer: I've only read the Paper, not the implementations (yet)

Yijunmaverick commented 6 years ago

@taesiri Yes, the Tensorflow implementation (by Evan) did some code optimizations to reduce the memory usage. Check the second paragraph in Evan's Readme:

"_As in the original paper, reconstruction decoders for layers reluX1 (X=1,2,3,4,5) are trained separately and then hooked up in a multi-level stylization pipeline in a single graph. To reduce memory usage, a single VGG encoder is loaded up to the deepest relu layer and is shared by all decoders."

taesiri commented 6 years ago

@Yijunmaverick Oh, I see. thanks for pointing that out.