OOM error when training on my own dataset

Cheng-Lin-Li / SegCaps

A Clone version from Original SegCaps source code with enhancements on MS COCO dataset.

Apache License 2.0

65 stars 29 forks source link

OOM error when training on my own dataset #5

Closed CreepZzy closed 5 years ago

CreepZzy commented 5 years ago

Hello, thanks for the code and documents, they are easy to understand.

When I train the model on my own dataset (dataset includes 16 512*512 2D grayscale png images, and masks are black-white), it always get an error that tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[16,1,4,512,512,2] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc

I have 2 GTX-1080 gpus with 8GB dedicated memory. And the options I use to do training is python ./main.py --train --split_num=0 --batch_size=2 --aug_data=1 --loglevel=2 --net segcapsr3 --data_root_dir=data --which_gpus=-1 --gpus=2 --loss bce --dataset mscoco17

In addition, is there any way that produces test outputs in black-and-white instead of yellow-and-purple?

Cheng-Lin-Li commented 5 years ago

Hi CreepZzy,

The model did not test on 2D grayscale images so you may need to modify the code. For memory issue, my suggestion is to reduce the batch size or to restrict tensorflow on GPU usage.

Good Luck to you.

With kind regards, Cheng-Lin Li

lalonderodney commented 5 years ago

Hello @CreepZzy,

Unfortunately at 512 x 512 input sizes, with the current SegCaps structure, this takes up almost the entire memory of a 12GB GPU. My suggestions would be to either change the code to accept something like 256 x 256 inputs or to change the network of SegCaps to be lighter (less capsule types per layer or less layers). Hope this helps. For the test output, just change the plotting within test.py to whatever colors you want.

CreepZzy commented 5 years ago

@lalonderodney Thank you for your helpful reply. I'll have try on these kinds of modification. I don't know that CapsNet can be such GPU hunger.

lalonderodney commented 5 years ago

@CreepZzy,

In order to do the dynamic routing of child capsules to parent capsules, you must store the intermediate representations to be routed. This takes memory. It's similar to how DenseNets, although having far fewer parameters take more memory. Somewhat recently many implementations have come out for memory efficient implementations of DenseNets. Hopefully something similar will be eventually developed by intelligent people for capsules :)