KichangKim / DeepDanbooru

AI based multi-label girl image classification system, implemented by using TensorFlow.
MIT License
2.65k stars 260 forks source link

OOM even when using batch size 1, total image count 1, tag count 4 and input size 1x1 #18

Closed saharNooby closed 4 years ago

saharNooby commented 4 years ago

I'm getting OOM on 1050Ti device:GPU:0 with 1331 MB memory. It's a usual situation for me, because 1.3GB of memory is not enough for training large networks. But in the past I've managed to train small and somewhat middle-sized networks on my GPU.

So, as usual, default training settings of DeepDanbooru cause OOM. I've tried to gradually reduce image size, batch size and tag count, but it didn't help.

Then, just for fun, I've set the following settings:

    "image_width": 1,
    "image_height": 1,
    "minibatch_size": 1,
    "epoch_count": 1,

image_records = image_records[0:1] (dataset size 1, originally there were 5K images) tf.config.experimental.set_memory_growth(tf.config.get_visible_devices('GPU')[0], True)

        # Prefetch was disabled:
        #dataset = dataset.prefetch(
        #    buffer_size=tf.data.experimental.AUTOTUNE)

This resulted in the following model:

Model : (None, 1, 1, 3) -> (None, 4)

and 1 epoch of 1 batch of 1 image:

tf.Tensor([[[[0.8124463  0.76407635 0.8057518 ]]]], shape=(1, 1, 1, 3), dtype=float32) tf.Tensor([[0. 0. 1. 0.]], shape=(1, 4), dtype=float32)

And it still throws an OOM:

2020-04-13 11:02:41.993600: W tensorflow/core/common_runtime/bfc_allocator.cc:424] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1B (rounded to 256).  Current allocation summary follows.
2020-04-13 11:02:42.383291: I tensorflow/core/common_runtime/bfc_allocator.cc:970] Stats: 
Limit:                  1396083507
InUse:                  1396083456
MaxInUse:               1396083456
NumAllocs:                   13265
MaxAllocSize:             13107200

What else I can reduce to 1, lol, to make this work?

I'm using Python 3.6, CUDA 10.1 and all te requirements installed from the requirements.txt.

KichangKim commented 4 years ago

DeepDanbooru uses CNN network and GAP for last output. So input size and output size does not much affect total memory usage. It requires more than 500MB for the model itself. And the same size of memory is needed for training with minibatch size 1 at least. I think 1.3GB is too small to train DeepDanbooru model.

You can try decreasing model size by modifying source code: https://github.com/KichangKim/DeepDanbooru/blob/e15d8bc44847800d29c0f13262c037f8df595275/deepdanbooru/model/resnet.py#L173

saharNooby commented 4 years ago

Thanks for the reply.

Modifying network structure is not a preferred choice for me bacause I want to retrain existing network (my dataset is too small for training from scratch).

I will try to use Google Colab, presumably they allow to use 12 GB per user.