Closed saharNooby closed 4 years ago
DeepDanbooru uses CNN network and GAP for last output. So input size and output size does not much affect total memory usage. It requires more than 500MB for the model itself. And the same size of memory is needed for training with minibatch size 1 at least. I think 1.3GB is too small to train DeepDanbooru model.
You can try decreasing model size by modifying source code: https://github.com/KichangKim/DeepDanbooru/blob/e15d8bc44847800d29c0f13262c037f8df595275/deepdanbooru/model/resnet.py#L173
Thanks for the reply.
Modifying network structure is not a preferred choice for me bacause I want to retrain existing network (my dataset is too small for training from scratch).
I will try to use Google Colab, presumably they allow to use 12 GB per user.
I'm getting OOM on 1050Ti
device:GPU:0 with 1331 MB memory
. It's a usual situation for me, because 1.3GB of memory is not enough for training large networks. But in the past I've managed to train small and somewhat middle-sized networks on my GPU.So, as usual, default training settings of DeepDanbooru cause OOM. I've tried to gradually reduce image size, batch size and tag count, but it didn't help.
Then, just for fun, I've set the following settings:
image_records = image_records[0:1]
(dataset size 1, originally there were 5K images)tf.config.experimental.set_memory_growth(tf.config.get_visible_devices('GPU')[0], True)
This resulted in the following model:
and 1 epoch of 1 batch of 1 image:
And it still throws an OOM:
What else I can reduce to 1, lol, to make this work?
I'm using Python 3.6, CUDA 10.1 and all te requirements installed from the
requirements.txt
.