Problem "Resource exhausted: OOM when allocating tensor"

deepfakes / faceswap

Deepfakes Software For All

https://www.faceswap.dev

GNU General Public License v3.0

50.52k stars 13.06k forks source link

Problem "Resource exhausted: OOM when allocating tensor" #40

Closed yappica closed 6 years ago

yappica commented 6 years ago

Hi, I tested the train in using CPU, all work fin. There was the preview window. And it saved the train date in the models. But when I changed to train on GPU, there was no preview window. It finished the train in 2 minute. Also there was no train date in the models directory.
I don't know what happened, maybe the video card I used. I use a GTX 660 support CUDA 3.0. Any help for it? Thank you.

Those are screenshot when using GPU for train. Resource exhausted: OOM when allocating tensor with shape[3, 3, 128, 256]

Operating system and version: Windows10
Python version: 3.6.4
Faceswap version: the least
Faceswap method: GPU with GTX660 installed cuda toolkit 8.0

Ganonmaster commented 6 years ago

Resource exhausted: OOM when allocating tensor with shape[3, 3, 128, 256]

OOM = Out of Memory. This means there is not enough graphics memory available to load the training data. It is possible that this is related to this issue,, which I will most probably be fixing tomorrow or the day after. (or, if you are capable, you could attempt to fix it on your own and send a pull request for us to merge)

But while that may be the case here, remember that a GTX660 has only 2GB of usable graphics memory in its default configuration. Most people who have had this running on GPU were using at least 4GB of graphics memory. It could therefore be possible that it is not related to that bug and that you do lack the memory required.

yappica commented 6 years ago

Thanks for explaining it. Is there a method to specify the size of the loaded train data? Even I load one pictures for training, it show me the OOM. The Tensorflow-gpu has a minimum requirement for memory of video card (at least 4GB )? Thanks.

Ganonmaster commented 6 years ago

It's not necessarily about Tensorflow. It's not necessarily about the pictures you are using. It's about the training model as well. There are a lot of factors to consider. Essentially, for this specific processing task, you need around 3-4GB of graphics memory. I'm unsure about specifying the size of the loaded training data; I would have to dive into Tensorflow and Keras in more detail. Perhaps this is something that can be improved in the future, but it looks like you might be stuck with CPU training.

yappica commented 6 years ago

Open model.py in lib, try reduce the filter number

def Encoder():
    input_ = Input(shape=IMAGE_SHAPE)
    x = input_
    x = conv(128)(x)
    x = conv(256)(x)
    x = conv(512)(x)
    x = conv(1024)(x)
    x = Dense(ENCODER_DIM)(Flatten()(x))
    x = Dense(4 * 4 * 1024)(x)
    x = Reshape((4, 4, 1024))(x)
    x = upscale(512)(x)
    return Model(input_, x)

It works. This is the link. Thanks deepfakes.

Clorr commented 6 years ago

Also this guy had the problem as his card is having only 2Gb: https://www.reddit.com/r/deepfakes/comments/7mqob8/first_try_with_smaller_network/

deepfakes commented 6 years ago

This issue was moved to deepfakes/faceswap-playground#13